As before, there is a 3D Inventor model of the data available; screenshots of this are included below. You can download the 3D model (822bytes gzipped) if you wish: load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give you the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.
All source data for this analysis came from www.specbench.org.
Given below is a comparison table of the various R10000/250 SPECfp95 test results. Faster systems are leftmost in this table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.
System: O2000 Octane O2 L2: 4MB 1MB 1MB tomcatv 34.6 29.4 10.2 swim 50.0 46.3 14.4 su2cor 15.6 11.2 5.40 hydro2d 16.6 11.4 3.26 mgrid 23.5 18.5 7.26 applu 14.4 13.2 6.49 turb3d 19.4 16.9 11.1 apsi 21.1 16.0 11.6 fpppp 37.8 37.1 37.2 wave5 33.7 27.4 12.8
SPECfp95 Comparison Table for MIPS R10000 250MHz
Next, a separate 2D comparison graph for each of the ten SPECfp95 tests:
These are easier to spot from the graphs, which is why I made them in the first place:
For example, for wave5, the 195 version in Origin2000 is 14% faster than the 195 version in Octane. But the 250 version in Origin2000 is 23% faster than the 250 version in Octane.
Think of it this way: take a balloon and draw two lines of different length on the balloon. Now blow up the balloon. As it expands, both lines are lengthened, but the distance between the ends of two lines also grows.
Thus, as processors become faster, the advantage of a larger L2 becomes greater. This is obviously 'common sense', but it's reassuring to see the effect actually happening.
The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various R10000/250 SPECint95 test results. After the table and 3D graphs is a short-cut index to the original results pages for the various systems.
System: O2000 Octane O2 L2: 4MB 1MB 1MB go 14.9 14.1 13.9 m88ksim 14.2 14.1 14.5 gcc 13.5 12.5 10.7 compress 15.0 13.9 12.0 li 12.3 11.9 11.9 ijpeg 12.9 12.6 11.5 perl 16.7 16.4 15.7 vortex 19.5 13.8 9.74
SPECint95 Comparison Table for MIPS R10000 250MHz
Next, a separate 2D comparison graph for each of the eight SPECint95 tests:
As with R10K/195, the results show a different variance compared to the SPECfp95 results given above. The important observations are discussed on the 195 page. What is of more interest here with respect to R10K/250 is the O2 results and the data for vortex for the three systems.
Here is a comparison table for the differences between Octane and O2, for R10K/195 and R10K/250 (I'm comparing O2 to Octane because it has the same L2 size). The figures denote how much faster Octane is over O2 for each test:
R10K/195 R10K/250 Test %Difference %Difference go 3.64 1.44 m88ksim 1.80 -2.76 gcc 12.0 16.8 compress 6.60 15.8 li 1.81 0.00 ijpeg 8.02 9.57 perl 0.00 4.46 vortex 36.6 41.7
For those tests which show a significant difference, one would expect a general increase in difference levels when moving from R10K/195 to R10K/250 (this clearly applies to gcc, compress, ijpeg and vortex). Other tests are well within margins of error. To be sure though, I need SPEC95 data for R10K/225, which isn't available yet for O2 or Octane (the CPU is, but not the test results).
All this analysing is fine and fair enough, but John's comments on the 195 page about the nature of these tests, namely that cache misses aren't occuring with most of the tests because the data sets are small, do pose a question: if only vortex is using a non-trivial data set, just how relevant is SPECint95 anyway? That's a difficult question to answer. For you the reader, you'd have to ask, "How big is my data set? Does the CPU keep having to access main RAM, jumping across a wide memory space? Is memory latency important to my task?"
If your data set is small and cache misses don't happen much, then you wouldn't see much benefit from using Origin or Octane over O2. I can imagine the image processing of NTSC movie frames would come into this category (each frame would fit into a 1MB L2). Ironically, PAL frames would not fit into a 1MB L2 cache (1.26MB per frame compared to 0.90MB per frame for NTSC).
Possible tip: if you're running int processing jobs on Origins, Octanes and O2s, try swapping the jobs around. You might get better performance for some of the tests because they may benefit from Origin's larger L2, or the better memory latency and outstanding cache miss support of Origin/Octane, etc. Meanwhile, a task like m88ksim which doesn't seem to benefit from these extra features would run just as well if it was moved from an Origin/Octane to O2. Thus, one could increase the performance of some tasks without making the remaining tasks any slower than they originally were. An extreme example would be if one had m88ksim-type task running on an Origin (call it task X) and a vortex-type task running on an O2 (task Y) - swapping the tasks over would give the same performance for task X, but task Y would speed up by a significant margin.