[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]


[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

Origin2000 Single-CPU SPEC95 Performance
Comparison Using Different R10000s

Last Change: 20/Aug/1998

SPEC's Introduction to SPEC95

SPECfp95 Analysis

SPECint95 Analysis


(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other single-CPU comparison pages)

Origin2000 Single-CPU SPECfp95 Performance
Comparison Using Different R10000s


Objectives

This analysis examines how different R10000 CPUs perform in Origin2000 for single-CPU performance only, ie. the focus is on how different R10000s perform in the same system, in this case Origin2000 (I have separate pages dealing with how the same CPU performs in different systems).

As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.

All source data for this analysis came from www.specbench.org.

Given below is a comparison table of available single-CPU R10000 SPECfp95 test results for Origin2000, covering 195MHz and 250MHz versions; for reference, an equivalent percentage increase is also included for each test, plus a final average percentage increase. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.

          R10000   R10000   R12000
          195MHz   250MHz   300MHz
          4MB L2   4MB L2   8MB L2

tomcatv    26.9     34.6     47.4
swim       41.2     50.0     71.3
su2cor     11.5     15.6     20.9
hydro2d    12.6     16.6     26.3
mgrid      18.8     23.5     37.2
applu      11.7     14.4     17.6
turb3d     15.3     19.4     26.7
apsi       15.6     21.1     30.3
fpppp      29.6     37.8     47.2
wave5      25.5     33.7     41.0


          % Increase     % Increase     % Increase
FROM:      R10K/195       R10K/195       R10K/250
TO:        R10K/250       R12K/300       R12K/300

tomcatv      28.6           76.2           37.0
swim         21.4           73.1           42.6
su2cor       35.7           81.7           34.0
hydro2d      31.7          108.7           58.4
mgrid        25.0           97.9           58.3
applu        23.1           50.4           22.2
turb3d       26.8           74.5           37.6
apsi         35.3           94.2           43.6
fpppp        27.7           59.5           24.9
wave5        32.2           60.8           21.7

    Origin2000 SPECfp95 Comparison

[Left Isometric View] [Right Isometric View]

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]


Next, a separate comparison graph for each of the ten SPECfp95 tests:

tomcatv:

tomcatv comparison graph

swim:

swim comparison graph

su2cor:

su2cor comparison graph

hydro2d:

hydro2d comparison graph

mgrid:

mgrid comparison graph

applu:

applu comparison graph

turb3d:

turb3d comparison graph

apsi:

apsi comparison graph

fpppp:

fpppp comparison graph

wave5:

wave5 comparison graph

Observations

Remember that the increase in clock speed from 195MHz to 250MHz is 28.2%. No one would expect a perfect scaling of speed, so a good result would be a 25% increase. Hence, one must examine whether each test achieves an increase as large as this or not.

Since the R10K/250 has its L2 cache running at 2/3rds core speed, one must also bare in mind that some tests may benefit from this faster L2 cache speed; however, this may be difficult to judge because of the small number of tests under dicussion, so I will not cover this aspect here in detail. One could perhaps form some conclusions by carefully comparing Origin200, Origin2000 and Octane figures, but any statements could easily be misleading because SPECfp95 only consists of ten tests.

Anyway, the main points which arise from the above graphs are as follows:

Note: given that Origin2000 CPUs have so much L2 cache, it is possible that one may be wasting resources by running a fp task on Origin2000 when that task does not need as much as 4MB L2. On my Octane single-CPU page, I suggest that if one has multiple systems available such as Origin2000 and Octane, then one should experiment to see which task is best suited to which system. Given two tasks A and B, running on Octane and Origin2000 respectively, swapping them over between the systems may significantly improve the performance of task A without harming the performance of task B.


I've talked alot on my analysis pages about L2 cache issues, but there is one area I have not discussed, namely compiler optimisation. This isn't an area I am greatly experienced with, but having read chapter 9 from the Indigo2 technical report entitled, "MIPSpro Compiler Technology", it is very obvious that some careful coding modifications can give significant performance improvements, in some cases far greater improvements than any CPU-upgrade would give. I also read a technical document on Cray's web site which detailed some typical coding modifications that can be made for vector systems; the document showed how a little attention paid to hardware issues, such as the size and frequency of memory load requests, could often offer enormous speed improvements simply by changing the code to take account of these hardware-level factors.

Upgrading a CPU may give a performance increase in the order of a few tens of percent, as is the case for R10K/250 vs. R10K/195, but some careful code optimisation can easily give far greater performance increases. So, if you're thinking about an upgrade, don't go spending a fortune if you haven't yet looked at optimising your code. Some careful thought and hard reading might cut those computation times down from several days to just a few hours. Obviously, combining code optimisation with a CPU upgrade would give the best improvement; what I'm suggesting is that one shouldn't spend money on upgrades until one has fully investigated optimisation issues.

Note that although there are online documents about code optimisation for various systems and compilers, there is also a wealth of hard printed books available on the subject. Consult your local library for some background reading; delving straight into an online guide that's specific to your system or task may make it hard to understand the general concepts involved. Besides, understanding the general principles will allow you to apply them to many systems and code types, not just the one task you happen to be concerned with at the time.


Origin2000 Single-CPU SPECint95 Performance
Comparison Using Different R10000s

Just as for the SPECfp95 analysis given above, you can download a 3D performance graph (gzipped) if you wish: load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective), etc.

The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various SPECint95 test results and an equivalent percentage increase. After the table and 3D graphs is a short-cut index to the original results pages.

          R10000   R10000     % Increase
          250MHz   195MHz    (195 -> 250)

go         14.9     11.4         30.7%
m88ksim    14.2     11.3         25.7%
gcc        13.5     10.4         29.8%
compress   15.0     11.3         32.7%
li         12.3     9.57         28.5%
ijpeg      12.9     10.2         26.5%
perl       16.7     13.3         25.6%
vortex     19.5     14.4         35.4%

Average (NB: 250/195 = +28.2%):  29.4%

     Origin2000 SPECint95 Comparison

[Left Isometric View] [Right Isometric View]

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]


Next, a separate comparison graph for each of the eight SPECint95 tests:

go:

go comparison graph

m88ksim:

m88ksim comparison graph

gcc:

gcc comparison graph

compress:

compress comparison graph

li:

li comparison graph

ijpeg:

ijpeg comparison graph

perl:

perl comparison graph

vortex:

vortex comparison graph


Observations

Although the above results are good, it is still wise to have proper tests done before making an upgrade decision. Because the test results show so little variance in the percentage improvements and individual SPEC ratios, it could be hard deciding which test is most like one's task.

Note that ijpeg (JPEG compression) may be a computational area that is hardware accelerated on some systems, depending on the presence or absence of video board options.

Finally, when dealing with high-end systems like Origin2000, it is highly advisable to explore all possible avenues of compiler and code optimisation before contemplating an upgrade. Sometimes, careful changes to code design can give rise to large performance increases, especially by tuning one's code to match specific hardware parameters. Please see the last three paragraphs of the SPECfp95 discussion above for further comments on this subject.


Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)
[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]
[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]