The ICE ASIC in O2 [details] has a 128bit core which, at the moment, allows it to do sixteen 8bit integer operations or eight 16bit integer operations per clock (obviously, ICE is different here in that it does integer operations only whereas RCP does floating point operations too). But a vector processor is a very different concept; if the RCP really is a vector design then it is an extremely powerful ASIC indeed, way beyond what most gamers will initially be able to appreciate, given that they're unlikely to know what a vector processor is and will probably be judging by clock speed alone.
SGI's PR is vague about the RCP's exact capabilities (it says 'over half a billion operations per second', not exactly half a billion/sec). When I asked about this, all I could get back by way of an answer was that it's meant to be vague. I suspect this is because it is perhaps programmable in some way (ICE is programmable and I've been told that the two designs teams for ICE and RCP shared some talent), and/or because it has abilities that are not currently being utilised to the full, or even at all, by the games being written just now.
An ordinary processor uses registers to perform operations and calculations. A typical machine code instruction might be:
ADD D1, D2, D3
which might add the contents of D1 and D2, storing the result in D3. Registers might be 8bit (eg. 6502 as used in the BBC B), 16bit (eg. Intel 80286), 32bit (eg. MIPS R3000 as used in the PSX), 64bit (eg. MIPS R5000), etc. Thus, for example, a 100MHz CPU which could do two such add operations per clock would be rated at 200MOPS.
But a vector processor is very different. Operations are carried out on banks of registers as opposed to single registers on their own. A bank consists of a particular number of 'parallel' registers, all of the same bit width, and is called a vector. From what information is available, the RCP uses vectors of eight registers. Some supercomputers of old (eg. Cray systems) used vectors as large as 72 registers, but that's obviously somewhat OTT for a games console. A vector operation might look like this:
ADDV V1, V2, V3
Assuming a vector had, say, 32 registers, then in this example the 32 registers in vector 1 would be added to the 32 registers in vector 2 and the results stored in the 32 registers in vector 3, all done with a single instruction. As you can see, such a computational model allows for very high processing power, but the disadvantage is that it's inflexible. No accurate info is available on this, but my guess is that the RCP also allows one to access individual registers as well as treat each 64bit register as, say, two 32bit registers, or four 16bit registers.
The other alternative is that the RCP core is very like ICE's core, namely a 128bit path which can be split for multiple bit-width operations; if this were true, then SGI's PR of 'over half a billion operations/sec' would have to come from assuming the use of 16bit operations (8 per clock, 62.5MHz = 500M/sec). That doesn't sound right to me, since most graphics functions are 32bit (64bit involved too if texturing is being used, which is most of the time with N64 games).
So what does this all mean? Well, the difference is stark:
Peak Performance (Operations/Sec) Design Type 64bit 32bit 16bit 8bit 128bit path 125M 250M 500M 1000M Vector 500M ? ? ? Vector + Extensions 500M 1000M 2000M 4000M
As you can see, the design type does make rather a big difference in abilities. So which is it? Frankly, right now, I don't know, but I'm going to try and find out.
Historically, vector processors were very popular in large supercomputers (and still are for some markets). However, as the years have gone by, general RISC processors have caught up and passed vector processors in overall price/performance/flexibility. Today, processors such as R10000, 21264, PA8000 series, etc., are used more and more often in large supercomputers, though vector designs are still used in DSPs, etc.