SGI Performance Comparisons

Alias V11 Render Benchmark Results

Complex Scene Rendered Using raytracer/powertracer

Last Change: 03/Apr/2014

This test uses the command-line Alias renderer programs, 'raytracer' and 'powertracer' (raytracer uses just one CPU. powertracer uses multiple CPUs). The example scene looks like this:

Complex Scene

The source scene file is now available for download (1.4MB gzip file) (my thanks to John Harwood for giving permission) so feel free to run your own tests and send me the results! To run the tests, there must be a subdirectory called 'pix' in the same directory that contains the scene file, "DNA_Green_Purple_Test_Sm" (remember to gunzip the archive first). Thus, if the scene file is in /var/tmp, one would enter:

  cd /var/tmp
  mkdir pix
  raytracer DNA_Green_Purple_Test_Sm

or for systems with multiple CPUs:

  powertracer DNA_Green_Purple_Test_Sm

The normal output from each command gives the total render time. NB: to obtain consistent and sensible times, it is a good idea to shut down all unnecessary background processes before commencing the test (mediad, httpd, etc.)

Here are the results:


              Num    -------- CPU --------      Time
  System      CPUs   Type     MHz    L2/L3    (h:mm:ss)   Run with...    NOTES

  Origin350   32     R16000   700     4MB      0:01:35    powertracer
  Tezro        4     R16000  1000    16MB      0:02:16    powertracer
  Tezro        4     R16000   700     8MB      0:02:49    powertracer    Node board came from an O3K CX-Brick, hence the 8MB L2.
  Origin300    8     R14000   500     2MB      0:03:05    powertracer
  Origin300    4     R14000   600     4MB      0:03:20    powertracer
  Origin300    4     R14000   500     2MB      0:04:18    powertracer
  Onyx2        4     R14000   500     8MB      0:04:30    powertracer
  Onyx2        4     R12000   400     8MB      0:04:54    powertracer
  Tezro        2     R16000   700     4MB      0:05:07    powertracer
  Origin300    2     R14000   600     4MB      0:05:39    powertracer
  Onyx        16     R10000   195     2MB      0:05:40    powertracer
  Tezro        1     R16000  1000    16MB      0:06:03    raytracer
  Fuel         1     R16000   900     8MB      0:06:27    raytracer
  Onyx        12     R10000   195     2MB      0:06:43    powertracer
  Onyx        20     R10000   195     1MB      0:06:57    powertracer
  Onyx        16     R10000   195     1MB      0:07:09    powertracer
  Octane       2     R14000   600     2MB      0:07:12    powertracer
  Fuel         1     R16000   800     4MB      0:07:28    raytracer
  Origin350    1     R16000   700     4MB      0:08:08    raytracer
  Onyx        12     R10000   195     1MB      0:08:09    powertracer
  Fuel         1     R16000   700     4MB      0:08:20    raytracer
  Onyx         8     R10000   195     2MB      0:08:41    raytracer
  Origin300    1     R14000   600     4MB      0:09:15    raytracer
  Octane       2     R12000   400     2MB      0:09:16    powertracer
  Fuel         1     R14000   600     4MB      0:09:27    raytracer
  Onyx         8     R10000   195     1MB      0:09:34    powertracer
  Octane       2     R12000   300     2MB      0:11:18    powertracer
  Onyx         4     R10000   195     2MB      0:11:30    powertracer
  Octane       1     R14000   600     2MB      0:11:34    raytracer
  Octane       2     R12000   350     1MB      0:12:32    powertracer    [hinv] (CPU mod, stage 2. Overclocked from 250 to 350)
  Onyx2        1     R12000   400     8MB      0:12:36    raytracer
  Octane       1     R14000   550     2MB      0:13:38    raytracer
  Fuel         1     R14000   500     2MB      0:13:49    raytracer
  Onyx         4     R10000   195     1MB      0:14:41    powertracer
  Octane       1     R12000   400     2MB      0:14:44    raytracer
  Onyx2        2     R10000   195     4MB      0:15:29    powertracer
  Octane       2     R12000   250     1MB      0:15:36    powertracer           (CPU mod, stage 1. Not yet overclocked)
  Octane       2     R10000   250     1MB      0:15:41    powertracer
  Octane       1     R12000   360     2MB      0:15:52    raytracer
  Octane       2     R10000   195     1MB      0:18:37    powertracer
  Octane       1     R12000   300     2MB      0:18:48    raytracer
  Octane       2     R10000   175     1MB      0:20:58    powertracer
  Octane       1     R10000   250     2MB      0:23:07    raytracer
  Onyx         4     R4400    250     4MB      0:24:00    powertracer    [hinv]
  Onyx2        1     R10000   195     4MB      0:25:57    raytracer
  Octane       1     R10000   250     1MB      0:26:13    raytracer
  O2           1     R12000   400     2MB      0:28:36    raytracer
  Octane       1     R10000   195     1MB      0:34:52    raytracer
  Octane       1     R10000   175     1MB      0:34:52    raytracer
  Onyx         1     R10000   195     2MB      0:35:27    raytracer
  O2           1     R7000    600   256K/1MB   0:38:49    raytracer      [hinv]
  Indigo2      1     R10000   195     1MB      0:42:36    raytracer
  O2           1     R12000   300     1MB      0:44:12    raytracer
  Onyx         1     R10000   195     1MB      0:45:09    raytracer
  O2           1     R12000   270     1MB      0:47:22    raytracer
  O2           1     R10000   250     1MB      0:47:39    raytracer
  O2           1     R7000    350     1MB      0:48:26    raytracer
  O2           1     R10000   225     1MB      0:53:21    raytracer
  O2           1     R10000   195     1MB      0:53:56    raytracer
  O2           1     R5200    300     1MB      0:59:25    raytracer
  O2           1     R10000   175     1MB      1:12:27    raytracer
  O2           1     R5000    200     1MB      1:12:49    raytracer
  O2           1     R10000   150     1MB      1:17:36    raytracer
  Indigo2      1     R4400    250     2MB      1:28:00    raytracer
  O2           1     R5000    180     512K     1:29:31    raytracer
  Indigo2      1     R4400    200     2MB      1:38:34    raytracer
  Indy         1     R5000    180     512K     1:48:43    raytracer
  Indy         1     R5000    150     512K     1:48:54    raytracer
  Indigo2      1     R4400    200     1MB      1:50:02    raytracer
  Indy         1     R4400    200     1MB      1:50:32    raytracer
  Indy         1     R4400    150     1MB      2:18:54    raytracer
  O2           1     R5000    180     -        2:20:14    raytracer
  Indy         1     R4600    133     512K     2:21:21    raytracer
  Indy         1     R5000    150     -        2:37:42    raytracer
  Indy         1     R4000    100     1MB      3:24:16    raytracer
  Indy         1     R4600    133     -        3:58:26    raytracer
  Indy         1     R4600    100     -        3:59:22    raytracer


Discussion

This is a rather complex scene file, with results in stark contrast to the Maya render test.

Notice that at lower clock speeds (eg. 300MHz) multiple CPUs scale quite nicely in Octane for this particular Alias scene, but at higher clocks (400 and 600) the dual-CPU Octane doesn't scale so well, suggesting memory bandwidth and/or speed may be becoming a bottleneck, ie. it's likely the render is doing a lot of main memory access.

Also, the table shows a Fuel at 600MHz is 18% faster than an Octane with the same speed CPU. Based on the Octane R10K/250 results for 1MB vs. 2MB L2, the data suggests that one third of the speedup is due to the Fuel's faster memory, while two thirds of the speedup is due to the Fuel's larger 4MB L2. This is confirmed by comparing Onyx R10K/195 with 1MB vs. 2MB L2. So, faster memory definitely helps, but for this particular test a somewhat larger L2 is twice as useful, which bodes well for Origin3K systems that have 8MB L2 per CPU. Indeed, despite having the same older O2K architecture as Octane, the quad-400 Onyx2 does reasonably well most likely because of its much larger 8MB L2, though the data confirms this test definitely benefits from faster RAM access; this is why, despite a much smaller L2, the quad-500MHz (2MB) Origin300 is faster than the quad-400 Onyx2.

Especially interesting is that a dual-600 Octane is not that much faster than a 700MHz Fuel, and so as expected a Fuel/900 is faster than a dual-600 Octane. Thus, I expect a Fuel/800MHz would beat a dual-600 Octane aswell. Likewise, a dual-600 Origin300 is much quicker than a dual-600 Octane (again, this is due to the larger L2 and faster RAM in the O300).

Meanwhile, a quad-600 O300 does offer a good speedup over a dual-600 O300, though the performance improvement is starting to tail off slightly; if the increase was linear, then the quad-600 O300 would give 2 mins 50 secs when infact it gives 3 mins 20 secs, but that's still almost 70% faster for 2X more CPUs. powertracer can use any number of CPUs, but for rendering a single image there is definitely a degree of diminishing returns (see below for more on this re the Onyx rack results).

With respect to O2, just like other complex fp tests I have done, it is clear O2 is not a good solution for this sort of task. Even the most expensive R12K/400 O2 is barely half the speed of an Octane with the same CPU (which is much cheaper) and can't even beat an R10K/250 Octane. O2 has a much slower conventional CPU/RAM link compared to Octane, with higher latency aswell; O2's strengths lie with tasks involving video and complex 3D, video as texture, MJPEG processing, real-time and volumetric imaging, etc.

The Onyx results are most intriguing. Having more CPUs does speed things up, but the gains quickly tail off beyond 8 CPUs and eventually max out with 20 CPUs (the result for 24 CPUs is slower, 7 mins 5 secs, so the overhead processing presumably outweighs the benefits). The results for the R10Ks with 1MB L2 imply that such a system would be better exploited by rendering frames using separate groups of 4 or 8 CPUs; indeed even using 4 CPUs is 3 minutes slower than a 4X linear increase over 1 CPU. Thus, overall, if one is rendering lots of frames, it's best to render on a system like this using raytracer with 1 frame per CPU (and to some extent this is true of all the multi-CPU systems, but if one is rendering a single frame (eg. large advert poster) then obviously using multiple CPUs with powertracer is very useful indeed.

One final example of the difference L2 makes: R4600PC/133 Indy is painfully slow, but with an added 512K L2 the speedup is phenomenal - almost as fast as an R5000PC/180 O2. At one point it was expected SGI would release an R4600 at 250MHz or higher, which would have been quite good with a 1MB or 2MB L2. In the end though, SGI switched to the R5000 instead for low-end desktops, though it's strange that the R4600SC/133 was also available for Indigo2.


Background

Using powertracer may involve some overhead compared to raytracer, so here's a test to check this, running raytraver vs. powertracer for the same system/CPU:

           Num    ------- CPU -------      Time
  System   CPUs   Type    Speed   L2      (mm:ss)     Run with...

  Octane    1     R14000  550MHz  2MB      13:38      raytracer V11
  Octane    1     R14000  550MHz  2MB      14:31      powertracer V11

Thus, the powertracer overhead is 6%, which should be taken into account when comparing raytracer vs. powertracer results, though it's not a huge issue most of the time. Other renderers may be more or less efficient.

One other factor which might affect the results is to what extent powertracer is even using multiple CPUs at each stage of the render. I tested this and found that, apart from a short period at the beginning and end of each test (just a few seconds), both CPUs are used just fine.

Lastly, some unanswered questions:

  1. Why is R12K/270 O2 not much better than R10K/250 O2?

  2. Why is R10K/195 Octane no better than R10K/175 Octane?

  3. Why is R5K/180SC Indy barely any faster than R5K/150SC Indy?

My guesses at the answers to these oddities: in all cases, most likely the higher-clocked CPUs have their L2 cache connected at a lower speed. L2 seems to matter a lot for this test, so perhaps the slower L2 counteracts the higher clock speed. Even so, if this is true, it's odd that the enhancements built into R12K don't seem to help much for the O2 test - maybe those enhancements are of no help because of the restrictions placed on R12K CPUs in O2 compared to how they are able to operate in Octane/Origin.


Feedback is most welcome! :)



Octanes not yet tested:

  Dual-R12K/360
  Dual-R12K/270
  Dual-R10K/225
  Single-R12K/270
  Single-R10K/225

Fuels not yet tested:

  R16K/800 (4MB)

O2s not yet tested:

  R7K/600

Indigo2s not yet tested:

  R10K/175
  R4K/175 (1MB)
  R4K/150 (1MB)
  R4K/100 (1MB)
  R4K/100
  R4600SC/133 (512K)
  R8000/75 (2MB)

Indys not yet tested:

  R4400SC/175 (1MB)