For a general description of HolliDance, I offer David's own:
David has ported HolliDance to several platforms, including Power Macintosh, BeOS machines, Sony Playstation and, of course, SGIs. The distribution file for SGIs is available at:
or you can download my own local copy (the file is approximately 515K). Note that my own submitted results have been using the executable called HolliDance5.3 since that doesn't bother with audio input. Also note that HolliDance won't run on systems with an 8-bit XL graphics board since the program cannot obtain an appropriate visual.
Why use HolliDance as a benchmark?
The complete 3D scene in HolliDance contains some 10600 polygons with three lights (two directional and one spotlight). At first glance, one may think this is not a large number of polygons to have in a benchmark model, but what is far more important here is the presence of the multiple lights. As a result, a great deal of computation must be done to compute each frame because the spotlight involves different lighting normals for every vertex and the three lights must be combined.
Ordinarily, a model containing 10000 polygons would pose no problem to most modern graphics systems. However, the multiple lights impose a much greater burden on a graphics subsystem. This benchmark page focuses on exactly how different SGI graphics systems respond to the presence of this non-trivial lighting and, more importantly, why different systems behave in the ways they do.
Many graphics-intensive applications are bottlenecked by the available pixel fill rate or pre-processing performance, but in this case the crucial issue is how a system handles intensive geometry and lighting calculations. Different systems employ different methods, perhaps because of historical design legacies, cost targets, modern market goals, or other reasons; these varying approaches can have a radical effect on how a system handles the HolliDance program, with some surprising counter-intuitive results.
So who would be interested in HolliDance as a benchmark?
Anyone doing real-time 3D animation with non-trivial lighting and possible application overhead. Or, put simply, anyone doing VRML! There are plenty of benchmarks available for people doing CAD work, or complex rendering, or severely intensive pixel filling (Viewperf and other benchmarks cover these areas), but nobody has yet created a benchmark which would be of interest to VRML programmers and general real-time 3D animators, which includes game writers (not many at present for SGIs, but I expect that to change over the next few years). HolliDance fills this gap nicely.
Also note that HolliDance is a double-buffered program. Graphics experts may complain about a double-buffered test being used as a benchmark (here's why), but the point is that the 'real-world' task this test is supposed to represent is a double-buffered application, namely real-time animations. For such tasks, one wishes to know how a system performs in double-buffered mode for particular tasks and how different rendering states affect performance, as opposed to the peak theoretical single-buffered performance of the graphics hardware, ie. the final real-world situation is more important.
The keys aspects of these kinds of 3D environment are:
The Benchmark Tests
When HolliDance is first run, a 320x240 window is presented. A text output in the executing xterm displays a constantly updated frame rate.
My test suite utilises four scenarios: two different viewpoints at two different window sizes (default window size and near-maximum size). Here is a summary of these scenarios:
The 'default camera view' (21K JPEG) referred to in tests 1 and 2 is obtained by pressing 'C' when the program first runs.
The viewpoint used in tests 3 and 4 is a reverse-angle view (24K JPEG). The easiest way to obtain the view is to simply use the available key controls until the view is more or less identical. This is trivial to do.
For each of the four tests, the program's texture, lighting and 'background' scenery states are altered ON and OFF in combination, with frames-per-second (fps) and data from gr_osview recorded in each case. Since gr_osview can affect frame rate, the fps figures should be noted with gr_osview in a minimised-window state. Please note that, to properly show why HolliDance is useful as a benchmark, gr_osview must be used in order to reveal more detailed information about what is happening inside the CPU and graphics subsystems. This does mean it takes more time to gather the results, but this is the only way to obtain genuinely useful data.
Here are the raw results reports, presented in order of reporting date (due to the nature of the tests, it isn't possible to list the results in terms of any overall winning performance metric). You may need to widen your browser window slightly.
TABLE 1. Frames Per Second System System Name and T2 T2 T4 T4 CPU Type, Clock Num Submission IRIX Number Graphics Name/Type LON LOFF LON LOFF Speed, L2-per-CPU CPUs Date O.S. 15. Indy R5000SC/150 XGE24 1.58 2.12 0.38 1.87 R5000SC 150MHz 512K 1 10/Feb/2001 6.2 14. Onyx RE2 2RM5 13.00 24.00 14.50 23.50 R10000SC 196MHz 1MB 2 01/Feb/2001 6.5 13. Onyx2 IR 4RM9/64MB 35.71 71.43 35.71 71.71 R10000SC 250MHz 4MB 4 05/May/2000 6.5.7 12. Indigo2 MaxIMPACT/1MB 14.93 24.39 14.93 18.52 R10000SC 195MHz 1MB 1 04/May/2000 6.5.7 11. Indigo2 Extreme 1.90 4.52 0.39 0.44 R4400SC 250MHz 2MB 1 11/Apr/2000 6.5.5m 10. Indigo2 HighIMPACT 9.01 12.05 8.50 12.05 R4400SC 250MHz 2MB 1 24/Jan/1999 6.5 9. Indigo GR2-XS24Z 0.29 0.88 0.04 0.05 R3000PC 33MHz 1 24/Aug/1998 5.2 8. Indigo GR2-Elan 0.42 2.13 0.04 0.04 R3000PC 33MHz 1 12/Aug/1998 5.2 7. Onyx RES 1RM4 15.40 21.00 ? ? R10000 194MHz 1MB 4 10/Aug/1998 ? 6. Indigo GR2-Elan 0.92 2.60 0.45 1.20 R3000PC 33MHz 1 08/Aug/1998 5.3 5. Indy 24bit XL 0.74 1.04 0.17 0.18 R4600PC 100MHz 1 15/Jul/1998 6.2 4. Indy 24bit XL 0.87 1.18 0.21 0.22 R4600PC 133MHz 1 14/Jul/1998 6.2 3. Indy 24bit XL 1.42 1.95 0.29 0.30 R4400SC 200MHz 1MB 1 30/Jun/1998 6.2 2. O2 CRM (rev 2/C/B) 4.15 6.54 4.50 6.56 R5000SC 200MHz 1MB 1 29/Jun/1998 6.3 1. Indigo2 GR3-Elan 0.68 2.68 0.22 0.29 R4400SC 250MHz 2MB 1 29/Jun/1998 6.2
Tests 2 and 4 involve large windows; results for tests 1 and 3, which involve small windows, are not shown in the above table. T2 LON = Test 2, Default View, Textures Off, Lights ON, Background On. T2 LOFF = Test 2, Default View, Textures Off, Lights OFF, Background On. T4 LON = Test 4, Reverse-Angle View, Textures ON, Lights ON, Background On. T4 LOFF = Test 4, Reverse-Angle View, Textures ON, Lights OFF, Background On.
If you wish to submit a set of results, download the HolliDance archive [local source] and a copy of the results reporting form (use your mouse right button to select the 'Save Link As...' option). Fill out the form and then email it back to me. Note that, at the very least, the form should have all system information and frames-per-second rates filled in - I don't mind too much if the gr_osview information is left out initially and then filled in later after the fps results are added to the results table.
Some peculiarities should be immediately noticable in Table 1, including:
In order to understand what is happening in these tests, one must:
Dealing with each of the earlier numbered points and referring to systems 1, 2 and 3 in Table 1:
Points 1 and 2:
Meanwhile, Indy XL does not suffer from these problems. All the geometry and lighting calculations are carried out by the main CPU. Thus, there are no FIFOs to overfill and far fewer context switches occur. The Indy's CPU runs flat out.
However, both systems have no hardware texture acceleration. In the case of Indy, the main CPU has this extra work to do on top of the geometry/lighting calculations, and so slows down. For Indigo2, the main CPU must also perform the texture calculations, but not the geometry/lighting calculations; this means more exchanging of data between the main CPU and gfx hardware compared to the non-textured scene, more context switches, and thus a significant slowdown. The greater exchange of data which occurs in Indigo2 Elan when textures are present can be seen by the fact that Indigo2 Elan runs Test 4 slower than Indy, but when the lights are turned off the Indigo2 Elan shows a greater speedup (though it still doesn't surpass Indy XL). For Indy XL, turning off the lights doesn't help much because the texture calculations are pretty complex anyway.
However, the degree to which texturing affects the calculations will be much lower if the window size is small. Comparing the results for Test 3 vs. Test 4 for Indy XL, the speed increase is very small when turning off the lights for a large window (0.29 to 0.30), but is much more significant when turning off the lights for a small window (1.63 to 2.16 = 33% faster). Indigo2 Elan shows similar behaviour, jumping from 0.22 to 0.29 for a large window, compared to jumping from 0.61 to 1.85 for a small window (203% faster). The higher increase for Indigo2 Elan shows its hardware Z buffer is more important to overall performance when the lights are turned off. O2's behaviour is different yet again because it has hardware texture acceleration.
I've also been told that graphics systems like XZ, Elan and Extreme cannot deal with the lighting calculations in hardware for more than one light at once, and that if extra lights are present then context switching causes temporary data to be stored in main memory (or perhaps some form of cache memory for newer systems like IMPACT). Because of this, further slowdown is inevitable.
So who cares? Why does this matter?
These issues are important because, all put together, they amount to a startling conclusion for those considering second hand and/or older systems:
In this case, because the R5000 is much faster than the compute power offered by the XZ board, a scene with complex lighting would be rendered faster on an R5000 Indy XL (this is my theory, but I'm confident that results as they come in will prove me correct). Note however, that what I've described here, namely Indy XL being faster than Indy XZ, can easily not be true, for example in situations where:
I can imagine a similar example of this: 195MHz R10000 Indigo2 XL compared to 195MHz R10000 Extreme. I would now expect the former system to be faster where complex lighting is involved. Similarly, I predict that an O2 with a main CPU that is faster than the GEs of a hardware graphics board will beat a system using such a graphics board for complex lighting tasks. Here I'll make a specific prediction: when O2 can utilise a main CPU which offers an MFLOP rate that is greater than the GEs of IMPACT, I predict that such an O2 could outperform an Indigo2 R10K SolidIMPACT for situations involving complex geometry/lighting (though the higher pixel fill of IMPACT much just hold sway in the end). Such a day may not be that far off: the GE11 ASIC in IMPACT offers 480MFLOPS, whilst R10K/250 in O2 theoretically offers 500MFLOPS. A test comparing the two for HolliDance would be very interesting (if you the reader are using an Indigo2 SolidIMPACT, or an Octane/SI or SE, please submit some results!). When O2 can utilise the R12000 at, say, 300MHz, the fact that O2 won't incur FIFO overfills and context switches for complex lighting should give it an edge over SolidIMPACT - though again, IMPACT's higher pixel fill might make the difference. We shall see.
Perhaps this is why SGI decided to offload geometry and lighting onto the main CPU for O2? ie. it was foreseen that, fairly rapidly, main CPU power would exceed the GE power of older good mid-range gfx systems like IMPACT. It was probably possible to produce geometry hardware that was newer and faster, but that might not have been as cheap as using the main CPU instead. Notice that the primitive-level benchmarks suggest O2 would be about 2.7X better than Indigo2 Elan (Lit GZ Tris/sec), but the benchmark results show that R5K/200 O2 can be 6 or 7 times faster than R4400/250 Indigo2 Elan for tasks like HolliDance (obviously more so when texturing is involved). It's all down to geometry and lighting.
Conclusion and Summary
Most of these issues concern older SGI systems because such systems can sometimes expose a performance overlap between the main CPU and the power of particular hardware graphics configurations. Systems affected include Indigo, Indy, Indigo2, Crimson - any system that can use the older gfx options such as XS24, XL, XZ, Elan and Extreme (newer graphics options like IMPACT and those for Octane offer geometry performance that has not yet been exceeded by main CPU power; MaxIMPACT offers 960MFLOPS of compute power from the two GE11 ASICs). Since Indigo2 IMPACT can be faster than O2, and O2 can be faster than low-end Indigo2s and Indys, then O2 also comes into the equation.
It all means that one must judge carefully what is the best system if one is considering a 2nd-hand purchase. Until I'd performed this analysis, I'd always assumed that a system with a graphics board that accelerated geometry/lighting would outperform the same system using a graphics board that didn't. This can quite easily not be the case if the main CPU is a good one and the task is of a particular type (ie. complex lighting).
Just like any benchmark, the results presented here will only be of interest to those whose tasks are similar to the benchmark. In this case, I suggest such people will be those involved with VRML modeling, real-time 3D animation and possibly game creation (the latter may be less true because most people doing game development aim for higher power hardware that supports accelerated texturing - such systems usually offer much better geometry/lighting acceleration anyway).
If you are such a person and are currently in the middle of contemplating a system upgrade, consider the options carefully! Up until now, if someone had an R4K/150 Indigo XZ, I would typicallly recommend aiming for an Indigo2 Extreme, but a better choice might actually be an R5K Indy, Indigo2 XL with high-clocked R4400 or R10K, or O2 with a good main CPU.
In fact, this is possibly an area where O2 could be targeted to better effect. Traditional primitive-level benchmarks can easily lead one to believe Indigo2 Extreme may not be much slower than O2, but situations like HolliDance show that this assumption can easily be wrong by a factor of 3 or more.
This is yet another reason why one should be all the more aware of the nature of one's application and have at least a basic appreciation of how 3D graphics rendering works.
Incidentally, think back to the time when SGI released the R5000 for Indy. As part of the release, SGI renamed XL graphics configurations that came with R5000 from XL to XGE , and SGI never released performance figures for R5K Indy XZ - why? At the time, most people saw these moves as merely clumsy PR attempts to attract sales. But with hindsight the real reasons are obvious: an Indy cannot use any R5K present for geometry/lighting acceleration if there is an XZ graphics board present as well! Thus, figured SGI, in order to utilise R5K's higher performance for these tasks, an Indy must be using XL, so we'll give such systems a unique name (XGE) and never release performance figures for R5K XZ because there's no point (the R5K wouldn't be helping out with geometry/lighting - the performance wouldn't be that much better than an R4400/200 XZ). What SGI didn't was make these factors clear enough to sales personal and end users. Another little mystery solved.
The obvious question posed by all this is: can one stop a hardware graphics board from doing the geometry/lighting calculations? The honest answer at the moment is, I don't know. I'm trying to find out whether one can do this using GL calls (or whatever) to force geometry/lighting calculations onto the main CPU. If this can be done, applications like HolliDance could be speeded up by 2 or 3 times on some systems! If you have any information on this subject which may help clarify these issues, please feel free to contact me. Someone said that maybe offscreen rendering could be relevant, but I'm not sure yet.
What about non-SGI systems?
The concepts discussed here could easily apply to graphics systems on non-SGI machines. In the case of PCs, I can imagine a scenario where, for example, an entry-level GLINT GAMMA system with a K6-2 processor was outperformed by a dual-PII/400 system with a 3Dfx Voodoo2 (remember that the K6-2 is 4X faster than the PII/400 for single-precision floating point computation). Why should this be? Because, like the SGI examples given above, the 3Dfx system must do the geometry and lighting on the main CPU(s) whilst the GLINT system must (I suspect) do such calculations on the GLINT geometry engine - an entry GLINT system is highly unlikely to offer greater geometry power than than two PII/400s (not sure though; I'll check) and so the 3Dfx should be faster. Hence, for someone considering VRML modeling on PCs, they'd probably be far better off getting a K6-2 system with a graphics card like 3Dfx that does not have geometry/lighting acceleration. That's a definite point against the assumption that high-end PC graphics cards must always be faster. Put another way, I wouldn't recommend a high-end PC graphics system with geometry acceleration unless the buyer can afford a configuration with the best possible geometry power in the first instance (eg. maxed-out GLINT system), but that'd probably be quite expensive.
There must come a point though where the needs of calculating geometry and lighting, and the requirements of other effects and factors such as texturing, pixel fill, antialiasing, etc. must be properly combined. That will require a whole new approach and I think we'll start seeing such new methods sometime in 1999.
David White, the author of HolliDance (HD), has agreed to implement some changes I recommended that would enhance the degree to which HD can be used as a benchmark. The main change is that the next release of HD will allow one to have between 0 and 8 lights, instead of just the fixed 0 or 3. Two lights are always directional, so one will be able to have between 1 and 6 spotlights of different colours (one of which is always white). This is great! It will allow people to run the tests for different numbers of lights and see how the performance drops off as lighting complexity increases. With this data put into a line graph, such graphs can then be combined to create an all-encompassing HD performance diagram for SGI systems: one will be able to see exactly when, where and how systems drop off in performance as the number of lights increases, at what point the performance levels of different systems overlap or exceed each other in counter-intuitive ways, and observe any performance 'sweet spots' that occur. For example, I expect to see a different degree of performance drop-off for Onyx2 IR when the number of lights changes from 4 to 5 compared to changing from 3 to 4 (because IR supports four hardware lights).
In addition, David is going to include a feature whereby the various frames-per-second numbers displayed during the previous user-defined number of seconds (X) will be shown as an updated average frames-per-second every X seconds. This will make it much easier to observe performance levels. David is also including other enhancements, and a new release of HolliDance should be available within a matter of a few weeks. Note that, when the new version comes out, I will reorganise the test suite and re-contact all those who've submitted results so far, asking them if they would mind submitting a new set of results for situations involving different numbers of lights (obviously, the existing results will be perfectly valid for a test involving 3 lights). I hope you, like me, look forward to being able to extract some real detailed useful information about how various SGI systems perform and behave for HolliDance, a program that is - in my opinion - very representative of the rapidly emerging field of VRML, and a useful indicator of performance for general real-time 3D animation, games, etc.
I'll post a message to the comp.sys.sgi hierarchy when David releases the next version of HolliDance.
My thanks to Chris Zach whose questions about upgrading an R3000 Indigo Elan prompted this entire investigation. Also to Dave Olson of SGI for answering some of my questions on the subject, and to Homme Bitter of WebGuide for offering me some webspace for a mirror just when I needed it. Most of all, many thanks to David White for creating such a cool and useful program.
Comments, suggestions, etc. are most welcome. Now, if you haven't done so already, send in your HolliDance test results! :)