[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]


[WhatsNew] [Intro] [Sys Desc] [PR] [CPU] [RCP] [GFX] [Polygons] [MIPS/MFLOPS] [Res/Video] [FPS]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

MIPS/MFLOPS and CPU Performance

MIPS (the company name):

It is unfortunate that the term MIPS is used as a processor benchmark as well as a shorthand form of a company name, so first I better make the distinction clear. The company responsible for the CPU designs in the N64 is MTI, an abbreviation for MIPS Technologies Inc., but many people (incuding myself) refer to the company as just MIPS.


MIPS (the benchmark):

The processor benchmark called MIPS has nothing to do with the company name. In the context of CPU performance measurement, MIPS stands for 'Million Instructions Per Second' and is probably the most useless benchmark ever invented. The rest of this page concerns MIPS as a benchmark, not the company (also discussed here are the MFLOPS and SPEC benchmarks, plus a comment on memory bandwidth).

The MIPS rating of a CPU refers to how many low-level machine code instructions a processor can execute in one second. Unfortunately, using this number as a way of measuring processor performance is completely pointless because no two chips use exactly the same kind of instructions, execution method, etc. For example: on one chip, a single instruction may do many things when executed (CISC = Complex Instruction Set Computing), whereas on another chip a single instruction may do very little but is dealt with more efficiently (RISC = Reduced Instruction Set Computing). Also, different instructions on the same chip often do vastly different amounts of work (eg. a simple arithmetic instruction might take just 1 clock cycle to complete, whereas doing something like floating point division or a square root operation might take 20 to 50 clock cycles).

People who design processors and people like me who are interested in how they work, etc., almost never use a processor's 'MIPS' rating when discussing performance because it's effectively meaningless (like many people, I did at one time used to think that a CPU's MIPS rating was all important. Ironically, an employee at MIPS Technologies Inc. corrected my faulty beliefs when I asked him about the performance of the R8000.

MIPS numbers are often very high because of how processors work, but in fact the number tells one absolutely nothing about what the processor can actually do or how it works (ie. a processor with a lower MIPS rating may actually be a better chip because its instructions are doing more work per clock cycle). There are dozens of different processor and system benchmarks, such as SPEC, Linpack, MFLOP, STREAM, Viewperf, etc. One should always use the test that is most relevant to one's area of interest and the system concerned. With games consoles, however, this is a bit of a problem because no one has yet made a 'games console' benchmark test - people have to use existing benchmarks which were never designed for the job.

An example: imagine a 32bit processor running at 400MHz. It might be rated at 400MIPS. Now consider a 64bit processor running at 200MHz. It might be rated at 200MIPS (assume a simple design in each case). But suppose my task involves 64bit fp processing (eg. computational fluid dynamics, or audio processing, etc.): the 32bit processor would take many more clock cycles to complete a single 64bit fp multiply since its registers are only of a 32bit word length. The 32bit CPU would take at least twice as long to carry out such an operation. Thus, for 64bit operations, the 32bit processor would be much slower than the 64bit processor. Now think of it the other way round: suppose one's task only involved 32bit operations. Unless the 64bit registers in the 64bit CPU could be treated as two 32bit registers, the 32bit CPU would be much faster. It all depends on the processing requirements.

The situation in real life is far more complicated though, because real CPUs rarely do one thing at a time and in just one clock cycle. Simple arithmetic operations may take 1 cycle, an integer multiplty might take 2 cycles, a fp multiply might take 5 clock cycles, a complex square root operation in a CISC design take 20 cycles, and so on. Worse, some CPUs are designed to do more than one of the same kind of operation at once, ie. they have more than one of a particular kind of processing unit. CPUs such as SGI's R10000 series (or later equivalents), the HP PA8000 series, the old Alpha 21x64 series, etc. often have 2 or more integer processing units, multiple fp processing units and at least one load/store unit. Sometimes, they may have special units too, for example to accelerate square root calculuations.

But it doesn't stop there! Today, there are technologies such as MMX (from Intel) which is designed to allow a 64bit integer register to be treated as multiple 32bit, 16bit or 8bit integer registers, and also MDMX (from MIPS Technologies Inc.) which does the same but is more powerful in that it also allows the same register splitting to be done with fp registers and includes a 192bit accumulator register, although at present SGI hasn't implemented MDMX in any of their available CPUs. These new ideas enable many more calculations to be performed in the same amount of time compared to older designs. An example: Gouraud shading involves 32bit fp operations; using a 64bit fp register as two 'separate' 32bit fp registers will (at best) double the processing ability of the CPU.

So that's the MIPS benchmark dealt with, ie. it's useless, so ignore it. Since I mentioned fp calculations, that leads nicely onto the MFLOPS benchmark.


MFLOPS

People often mean MFLOPS to mean different things, but a general definition would be the number of full word-size fp multiply operations that can be performed per second (the M stands for 'Million'). Obviously, fp add or subtract operations take less time and slowest of all is fp divide. Older CPUs take many clock cycles to complete one FLOP and so, even at a high clock speeds, their FLOP rate can be low. An example is the 486DX4/100 which is rated at about 6MFLOPS. Compare this to the 200MHz R4400 which is rated at about 35MFLOPS. For older processors, clock speed is clearly no indication of MFLOP rate.

Newer designs don't mean things become clearer - if anything the situation is more complex, since the situation is often the reverse: CPUs like the R10000 can do two fp operations each clock cycle, giving it a rating of 400MFLOPS at 200MHz. The R8000 is even more confusing since it has two fp execution units, each capable of doing two fp ops/clock, giving it a rating of 360MFLOPS at 90MHz! (that's ten times faster than an Intel P90).

Again, the nature of the task is important. A 64bit CPU that can do 400MFLOPS may be fine, but if one's work only needs 32bit processing then much of the CPU's capabilities are being wasted. CPUs like the R5000 address this problem, aiming at markets that do not need 64bit floating point (fp for short) processing. Future designs like MDMX will solve the wastage problem, but it will also make the measuring of CPU performance even harder. Perhaps CPU capability is a better metric, but no one has devised such a test yet. There are just a wide variety of benchmarks and one must use the most appropriate test as a basis for decision making.

All this talk of MFLOPS is fine, but it misses one very important point: memory bandwidth. A fast CPU may sound impressive, and PR people will always talk in terms of theoretical peak performance, etc., but in reality a CPU's best possible performance totally depends on the rate at which it can access data from the various kinds of memory (L1 / L2 cache and main RAM). A fast CPU in a system with low memory bandwidth will not perform anywhere near its theoretical peak (eg. 500MHz Alpha). I have studied the effect of this on the 195MHz R10000 and the results are very interesting. If you want to know more about the whole issue of memory bandwidth, then see STREAM Memory Bandwidth site.

What is important here with regard to the N64 is that SGI have given it a very high memory bandwidth indeed (500MB/sec peak, ie. almost 4 times faster than PCI). The N64's memory design uses Rambus technology, which is also used in SGI's IMPACT graphics technology.


SPEC

I've always found it funny (or annoying, depending on my mood at the time :) that gamers try and use the SPEC benchmark when arguing about the performance of main controller CPUs in games consoles. What gamers should understand is that SPEC's main CPU test suite (currently SPEC2000) was never designed for games consoles; SPEC is not a graphics test, or a test designed to measure the preprocessing tasks often needed for 3D graphics, such as excessive MADD (multiply-add) operations, although some individual tests in the test suite may be similar.

SPEC (92, 95 or 2000) is an integer/floating point test package consisting of a number of separate integer and fp tests which are supposed to be run on systems with at least 32MB (SPEC92) or 64MB (SPEC95) of RAM (perhaps more for SPEC2000). Thus, it's fine to discuss theoretical SPEC performance of a CPU like the R4300i, but in the context of a games console it's completely meaningless. This is why you won't find my N64 tech info site using the theoretical SPEC numbers for the R4300i in my own discussions because they are useless when talking about the N64's real capabilities.

While I'm on the subject, please note the following:


Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)
[WhatsNew] [Intro] [Sys Desc] [PR] [CPU] [RCP] [GFX] [Polygons] [MIPS/MFLOPS] [Res/Video] [FPS]
[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]