[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

250MHz R10000 Performance Comparison
Between O2, Octane and Origin2000

Last Change: 11/Aug/1998

SPEC's Introduction to SPEC95
SPECfp95 Analysis
SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other R10000 comparison pages)

250MHz R10000 SPECfp95 Performance Comparison

The study given here is similar to the 195MHz R10000 discussion. Since the same concepts are relevant, I won't repeat all the background details, so please see the 195 page for all of the detailed observations, architectural discussions, illucidation of issues relating to cache access and memory latency, etc.

As before, there is a 3D Inventor model of the data available; screenshots of this are included below. You can download the 3D model (822bytes gzipped) if you wish: load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give you the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.

All source data for this analysis came from www.specbench.org.

Given below is a comparison table of the various R10000/250 SPECfp95 test results. Faster systems are leftmost in this table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.

Key:


      O2000  = Origin2000

System: O2000 Octane O2 L2: 4MB 1MB 1MB tomcatv 34.6 29.4 10.2 swim 50.0 46.3 14.4 su2cor 15.6 11.2 5.40 hydro2d 16.6 11.4 3.26 mgrid 23.5 18.5 7.26 applu 14.4 13.2 6.49 turb3d 19.4 16.9 11.1 apsi 21.1 16.0 11.6 fpppp 37.8 37.1 37.2 wave5 33.7 27.4 12.8

SPECfp95 Comparison Table for MIPS R10000 250MHz

(click on the images above to download larger versions of the views shown)

[Test Suite Description | O2000 | Octane | O2]

Next, a separate 2D comparison graph for each of the ten SPECfp95 tests:

tomcatv:

tomcatv comparison graph

swim:

swim comparison graph

su2cor:

su2cor comparison graph

hydro2d:

hydro2d comparison graph

mgrid:

mgrid comparison graph

applu:

applu comparison graph

turb3d:

turb3d comparison graph

apsi:

apsi comparison graph

fpppp:

fpppp comparison graph

wave5:

wave5 comparison graph

Observations

These are easier to spot from the graphs, which is why I made them in the first place:

The following is made clearer if you also examine the graphs for R10000/195: the degree to which a larger L2 cache helps the CPU's performance becomes greater as the CPU's clock speed increases.
For example, for wave5, the 195 version in Origin2000 is 14% faster than the 195 version in Octane. But the 250 version in Origin2000 is 23% faster than the 250 version in Octane.
Think of it this way: take a balloon and draw two lines of different length on the balloon. Now blow up the balloon. As it expands, both lines are lengthened, but the distance between the ends of two lines also grows.
Thus, as processors become faster, the advantage of a larger L2 becomes greater. This is obviously 'common sense', but it's reassuring to see the effect actually happening.
Many tests have gained in performance by a factor that is much greater than 250/195, ie. the faster L2 cache speed is also helping.

250MHz R10000 SPECint95 Performance Comparison

As usual, you can download a 3D performance graph (gzipped) if you wish: load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective), etc.

The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various R10000/250 SPECint95 test results. After the table and 3D graphs is a short-cut index to the original results pages for the various systems.

Key:


      O2000  = Origin2000

System: O2000 Octane O2 L2: 4MB 1MB 1MB go 14.9 14.1 13.9 m88ksim 14.2 14.1 14.5 gcc 13.5 12.5 10.7 compress 15.0 13.9 12.0 li 12.3 11.9 11.9 ijpeg 12.9 12.6 11.5 perl 16.7 16.4 15.7 vortex 19.5 13.8 9.74

SPECint95 Comparison Table for MIPS R10000 250MHz

(click on the images above to download larger versions of the views shown)

[Test Suite Description | O2000 | Octane | O2]

Next, a separate 2D comparison graph for each of the eight SPECint95 tests:

go:

go comparison graph

m88ksim:

m88ksim comparison graph

gcc:

gcc comparison graph

compress:

compress comparison graph

li:

li comparison graph

ijpeg:

ijpeg comparison graph

perl:

perl comparison graph

vortex:

vortex comparison graph

As with R10K/195, the results show a different variance compared to the SPECfp95 results given above. The important observations are discussed on the 195 page. What is of more interest here with respect to R10K/250 is the O2 results and the data for vortex for the three systems.

Only vortex seems to really benefit from a larger L2, with a degree of improvement between Octane and Origin that is greater for R10K/250 compared to R10K/195. In other words, for vortex on R10K/195, Origin2000 is 29% faster than Octane; but for vortex on R10K/250, Origin2000 is 41% faster. Just as described above for the SPECfp95 tests, faster clocked CPUs increase the performance of different systems, but a higher clock also means larger differences between the higher performance levels. Other tests such as go, gcc and compress do benefit from a larger L2, but not as much as a larger L2 helps fp tests. The reasons for this are explained on the 195 page.
Since vortex accesses memory in a varied manner, it is not surprising to see Origin moving ahead in performance compared to Octane (smaller L2) and O2 (cache miss behaviour not so good).

Prior to the release of R10K/250 for O2, I'd said it would be interesting to see how R10K/250 O2 performed compared to Octane and Origin, given the good int results O2 shows for R10K/195. From the figures, it's clear that R10K/250 O2 does very well, even beating both Origin2000 and Octane for m88ksim (the actual figures are well within typical margins of error, given the nature of compiler optimisation). Naturally, as the CPU becomes faster overall, the better memory latency of the Origin design is beginning to show through, with Octane starting to edge ahead for gcc, compress, perl, etc. (remember that Octane uses the Origin architecture).

Here is a comparison table for the differences between Octane and O2, for R10K/195 and R10K/250 (I'm comparing O2 to Octane because it has the same L2 size). The figures denote how much faster Octane is over O2 for each test:

          R10K/195        R10K/250
Test    %Difference     %Difference

go          3.64            1.44
m88ksim     1.80           -2.76
gcc         12.0            16.8
compress    6.60            15.8
li          1.81            0.00
ijpeg       8.02            9.57
perl        0.00            4.46
vortex      36.6            41.7

For those tests which show a significant difference, one would expect a general increase in difference levels when moving from R10K/195 to R10K/250 (this clearly applies to gcc, compress, ijpeg and vortex). Other tests are well within margins of error. To be sure though, I need SPEC95 data for R10K/225, which isn't available yet for O2 or Octane (the CPU is, but not the test results).

All this analysing is fine and fair enough, but John's comments on the 195 page about the nature of these tests, namely that cache misses aren't occuring with most of the tests because the data sets are small, do pose a question: if only vortex is using a non-trivial data set, just how relevant is SPECint95 anyway? That's a difficult question to answer. For you the reader, you'd have to ask, "How big is my data set? Does the CPU keep having to access main RAM, jumping across a wide memory space? Is memory latency important to my task?"

If your data set is small and cache misses don't happen much, then you wouldn't see much benefit from using Origin or Octane over O2. I can imagine the image processing of NTSC movie frames would come into this category (each frame would fit into a 1MB L2). Ironically, PAL frames would not fit into a 1MB L2 cache (1.26MB per frame compared to 0.90MB per frame for NTSC).

Possible tip: if you're running int processing jobs on Origins, Octanes and O2s, try swapping the jobs around. You might get better performance for some of the tests because they may benefit from Origin's larger L2, or the better memory latency and outstanding cache miss support of Origin/Octane, etc. Meanwhile, a task like m88ksim which doesn't seem to benefit from these extra features would run just as well if it was moved from an Origin/Octane to O2. Thus, one could increase the performance of some tasks without making the remaining tasks any slower than they originally were. An extreme example would be if one had m88ksim-type task running on an Origin (call it task X) and a vortex-type task running on an O2 (task Y) - swapping the tasks over would give the same performance for task X, but task Y would speed up by a significant margin.

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

250MHz R10000 Performance Comparison Between O2, Octane and Origin2000

Last Change: 11/Aug/1998

SPEC's Introduction to SPEC95 SPECfp95 Analysis SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale) (the graphs are also to the same scale as those given on other R10000 comparison pages)

250MHz R10000 SPECfp95 Performance Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | O2000 | Octane | O2]

250MHz R10000 SPECint95 Performance Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | O2000 | Octane | O2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

250MHz R10000 Performance Comparison
Between O2, Octane and Origin2000

SPEC's Introduction to SPEC95
SPECfp95 Analysis
SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other R10000 comparison pages)