[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

Origin200 Single-CPU SPECfp95 Performance
Comparison Using Different R10000/R12000 CPUs

Last Change: 24/Apr/1999

SPEC's Introduction to SPEC95

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other single-CPU comparison pages)


This analysis examines how different R10000/R12000 CPUs perform in Origin200, ie. the focus is on how different CPUs perform in the same system, in this case Origin200 (I have separate pages dealing with how the same CPU performs in different systems).

Note that I do not have any SPEC95 data for R10000 180MHz Origin200QC (this is the later 180MHz version of Origin200/180 with a larger 2MB L2 cache that runs at a faster speed). Since some systems will be using this CPU, please contact me if you have any detailed SPEC95 data for Origin200QC/180 (final base and peak averages are of little use; it's the detailed results I'm looking for).

As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.

For this analysis, the 180MHz R10000 Origin200 data was obtained from www.specbench.org. SPEC does not yet have 225MHz R10000 Origin200QC data posted up, but in meantime I have been supplied some fp data by John McCalpin, a former Server System Architect for SGI. John told me:

"I ran the SPECfp95 tests and got 20.4 overall. I am not sure which run ended up getting submitted, but I believe that my run is a valid submission..... The runs were with 1 cpu of a 4-cpu Origin200 system, with 225MHz cpus and 2MB 225MHz L2 caches. These results are with IRIX 6.4 and the version 7.2.1 Fortran compiler."

When SPEC has O200QC/225 results available, I'll use those instead if they're significantly (>2%) different from the data given to be by John.

Given below is a comparison table of single-CPU R10000/R12000 SPECfp95 test results for Origin200, covering 180MHz and 225MHz R10000 (remember that the 225MHz version has a faster, larger L2 cache), and 270MHz R12000. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.

          R12000   R10000   R10000
          270MHz   225MHz   180MHz

tomcatv    33.3     28.0     22.0
swim       44.0     40.3     34.5
su2cor     14.6     13.0     8.47
hydro2d    16.7     11.7     7.99
mgrid      24.9     18.7     14.8
applu      14.9     12.5     11.0
turb3d     20.4     17.3     14.3
apsi       25.3     18.6     11.9
fpppp      42.2     34.0     28.3
wave5      35.6     29.1     20.8

          %Increase    %Increase    %Increase
FROM:      R10K/180     R10K/180     R10K/225
TO:        R10K/225     R12K/270     R12K/270

tomcatv     27.3%        51.4%        18.9%
swim        16.8%        27.5%         9.2%
su2cor      53.5%        72.4%        12.3%
hydro2d     46.4%       109.0%        42.7%
mgrid       26.4%        68.2%        33.2%
applu       13.6%        35.5%        19.2%
turb3d      21.0%        42.7%        17.9%
apsi        56.3%       112.6%        36.0%
fpppp       20.1%        49.1%        24.1%
wave5       40.0%        71.2%        22.3%

      Origin200 SPECfp95 Comparison

[Left Isometric View] [Right Isometric View]

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 270MHz R12000 4MB L2/QC | 225MHz R10000 2MB L2/QC | 180MHz R10000 1MB L2]

Next, a separate comparison graph for each of the ten SPECfp95 tests:


tomcatv comparison graph


swim comparison graph


su2cor comparison graph


hydro2d comparison graph


mgrid comparison graph


applu comparison graph


turb3d comparison graph


apsi comparison graph


fpppp comparison graph


wave5 comparison graph


It's important to remember that the 225MHz CPU is using a 2MB 225MHz L2 cache, ie. faster and larger than the 1MB cache used with the 180MHz CPU.

Note that the ratio of the clock speed increase itself is 25%. However, many individual tests show much larger increases than this, and even the final peak averages (which I normally never deal with because I don't think they're particularly useful) show an increase of roughly 31%. This is compelling evidence that the faster, larger L2 cache on the QC model is indeed doing its job. Some tests increase by over 50%, a very significant performance improvement.

What is most interesting is that the improvements are quite different from those shown by upgrading Octane or Origin from 195MHz to 250MHz (clock difference ratio of 28%). Compare:

             Octane      Origin2000    Origin200
           % Increase    % Increase    % Increase
           (195->250)    (195->250)    (180->225)

tomcatv       16.2%         28.6%        27.3%
swim          14.0%         21.4%        16.8%
su2cor        16.2%         35.7%        53.5%
hydro2d       14.3%         31.8%        46.4%
mgrid         16.4%         25.0%        26.4%
applu         17.9%         23.1%        13.6%
turb3d        22.5%         26.8%        21.0%
apsi          25.0%         35.3%        56.3%
fpppp         24.9%         27.7%        20.1%
wave5         22.3%         32.2%        40.0%

My guess is that the reason why Origin200 is showing better performance improvements compared to Octane is because the 225MHz CPU has a larger and faster L2 cache. As for Origin2000, even though it has a faster CPU (250MHz), its L2 cache runs at 2/3rds of this speed (roughly 166MHz), so it's possible that the 225MHz ends up with a better performance in some cases since its L2 cache runs at the full core speed of 225MHz.

Further, any test which shows a good improvement for the Origin2000 upgrade correlates to a very good improvement for the Origin200 upgrade (examine su2cor, hydro2d, apsi and wave5). The larger the improvement for Origin2000, the more likely it is that Origin200 will show a better improvement. Again, I expect this is because Origin200's cache is running at a higher clock speed, despite Origin2000 having a larger L2.

An obvious exception is fpppp, but this is because fpppp uses a tiny data set which actually fits into the R10K's L1 data cache, so L2 cache issues are not important.

One might think applu shows an odd result: the Origin200 improvement isn't as high as the other tests. However, Origin200/180 and Octane/195 actually give very similar SPEC ratios in the first instance (11.0 and 11.2 respectively), whilst Origin2000 had a slightly better result (11.7), so don't read too much into the lower percentage increase for Origin200. The actual SPEC ratios aren't that different for applu overall (O200QC/225 gets 12.5, Octane/250 gets 13.2, O2000/250 gets 14.4). What is far more interesting is that, for six out of the ten tests, Origin200QC/225 is faster than Octane/250. This definitely shows that a larger, quicker L2 cache can benefit some tasks to a significant degree. Given this fact, one must look forward to seeing the release of future CPUs at higher clock speeds which have the L2 cache running at full core speed.

The lesson to be learned here is that an upgrade decision shouldn't be an automatic affair. R10K/225 may be a 25% increase in clock speed over R10K/180, but for Origin200 users it's entirely possible that they'd see a performance improvement of twice that or more (>50%), because of the larger, faster L2 cache. But this will not always be the case and will depend on the application in question. So, test first, decide later!

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)
[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]
[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]