C-Ray FP/CPU Benchmark Test Results

SGI hardware problems, solutions, tips, hacks, etc.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
User avatar
mapesdhs
Posts: 2516
Joined: Mon Nov 10, 2003 4:17 pm
Location: Edinburgh, Scotland
Contact:

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby mapesdhs » Mon May 05, 2008 3:26 am

Dr. Dave writes:
> Absolutely. But even the SGI white-papers discussed the differences in latency and throughput between the two,
> basically stating that "it depends on what you want to do" with regards to performance for particular tasks. Neither
> architecture was better across the board, but they both did their jobs well.

Exactly. O2's design is so unique, many benchmarks just don't convey how good it is as they're designed for
'normal' architectures. STREAM results on O2 are terrible, yet the system is really good for dealing with volumetric
data - when O2 first came out, a doctor told me O2 had cut times to prepare reports from medical scans from
an hour to 5 minutes because it was sooo much better for handling the scan data than any other system (he'd been
using a prerelease system for several months).

SGI made two mistakes with the O2 PR: not boasting enough about how good it was for those tasks that nicely
matched the O2 design, and not making it clear enough what O2 is not so good for, ie. reps sold O2s to customers
to solve tasks for which it is ill-suited. SGI also allowed reps to sell O2 as a replacement for Indigo2, which was
crazy. Even ILM got burned with this one, some HighIMPACT Indigo2s were replaced with O2s. Sheesh...


> smaller aps do start up quickly on the R12k. ...

I did start collecting results, but only for original base 6.5, which is no good for later systems. Need to redo them
for an OS variant that can apply to all models (6.5.22) and some nastier apps (Firefox & OpenOffice). My initial
tests used some as-supplied apps that run up too fast on newer systems (CosmoWorlds, Netscape).


> ... I also swear that the performance of the R12k drops off faster on a
> 'busy' box, but that's pretty subjective. I had been playing around a while back with a modded R12k/350/1M

Hmm, hard to know if that might not be mod-related. Note btw that normal R12K/300+ has 2MB L2.

> And very true... there's no comparison between a 200 and a 400. The real test, indeed, is how the 400 compares
> to the 600. ...

It varies. The R12K is better for C-ray, but that's a tiny dataset that doesn't rely on main RAM. For compiling, the
R12K is better with MIPS Pro, while the R7K is better with GCC. For the other tests, I can only extrapolate, but my
guess is the R7K would be faster for most GIMP tests, while Blender rendering would be a very close call, perhaps
slightly edged-out by the R12K.

> My sort of feeling right now is that the 600 would be a bit more responsive, but for raw processing the 400 would

One possibility is that since the whole GUI is using integer operations, the R7K's higher clock makes a difference
there.


> ... be faster once it was 'spun up' so to speak. A good test would be application start times under various conditions
> - unfortunately I don't have a 600 to try.

I'm determined to get one this year. Anyone got a spare? I'll swap for a PCI cage or something!


> ... It's just that the O2 has a fairly small footprint for an SGI, and getting it up to usable speed (read video playback,
> Firefox, etc.) is a good thing. ...

I agree, though whenever I mention O2's size/weight/power/noise advantages, I get moaned at because it's nowhere
near Fuel, etc. for raw speed. :D But some people do prefer O2 for non-speed reasons.


> Very true as well. Times are interesting these days, but there sure is a nice charm to these older MIPS processors. ...

Given MIPS doesn't have SSE/SSE2/etc., they do surprisingly well for the Blender tests, and hold their own for
C-Ray, certainly in terms of work done per clock tick anyway. Just a pity they were never clocked-up, never
went multi-core and never came out with MIPS V and MDMX. Yeah, Alien/Beast would have been nice. Still,
I'd love to get my hands on a 16-CPU R16K/1GHz O3K CPU brick. 8)


> ... The original comment was sort of directed at the fact that the R12k O2 was generating great numbers on the
> benchmarks, and just an observation as to why, although in retrospect it wasn't immediately as clear as it might
> have been, especially given the fact that the O2 architecture is not as responsive as Octane, Fuel, etc.

For C-Ray, it's entirely because the data set fits in L1/L2 cache, so the R12K's better internal fp capabilities
shine through.


> it's all good, however. If I get a chance I'll try to run the benchmark on an overclocked Octane to see how the
> results look. ...

What spec is your overclocked Octane?


> ... I did a quick pass to see if there also was a way to run the benchmark under Win2000, on my
> highly overclocked Penryn dual-core, but it didn't look trivial.

I was going to run it on my AMD 6000+, but I don't have Linux on it, so didn't see the point.

Ian.

User avatar
Dr. Dave
Posts: 2311
Joined: Fri Feb 13, 2004 10:37 pm
Location: Ottawa, Canada >burp<

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby Dr. Dave » Mon May 05, 2008 4:21 am

What spec is your overclocked Octane?

It's been a while, but I believe it's a 400 overcklocked to 481.25 MHz, on a SysAD of 87.5 Mhz, with cache clocked at 240.625 Mhz (cache divisor of 2, not 1.5).

For comparison's sake, a 400 is usually either on a SysAD of 100 or 114.285 Mhz, with the cache clocked at 267 Mhz (cache divisor 1.5) - though I've not seen any performance issues with the lower SysAD in the benchmarking I've done. It probably doesn't start being an issue until you've got a fast dual-processor setup. 485-ish is about as fast as you can go on a native 400, and even then it's luck of the draw. The 300's will generally do 350, but they start to run quite a bit hotter, so it's not wise to go any faster than that - the 360's and 400's do much better because they're running on a smaller die geometry.

Also as of a note, of the single/dual 400's I've seen, the singles tend to have a SysAD of 114.285, while the duals have a SysAD of 100. There must be a reason - maybe has to do with timing and throughput on the motherboard ASIC, though everything I've tried within reason seems to work OK irregardless.
:O3000: <> :O3000: :O2000: :Tezro: :Fuel: x2+ :Octane2: :Octane: x3 :1600SW: x2 :O2: x2+ :Indigo2IMP: :Indigo2: x2 :Indigo: x3 :Indy: x2+

Once you step up to the big iron, you learn all about physics, electrical standards, and first aid - usually all in the same day

User avatar
Dr. Dave
Posts: 2311
Joined: Fri Feb 13, 2004 10:37 pm
Location: Ottawa, Canada >burp<

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby Dr. Dave » Mon May 05, 2008 4:30 am

I agree, though whenever I mention O2's size/weight/power/noise advantages, I get moaned at because it's nowhere
near Fuel, etc. for raw speed. But some people do prefer O2 for non-speed reasons.


The biggest bit of fun is having pretty much unlimited texture memory, and being able to natively do 'video-on-a-texture', plus hardware colorspace conversion support. This is way more fun than simple overlays. Just a lot of apps don't take advantage of that effectively - and since all of the open-source software comes from the PC-realm, they're usually designed for 'overlays' rather than video-on-texture. Much less of an issue now, as PC graphics have gotten pretty good these days, but even just a few short years ago...
:O3000: <> :O3000: :O2000: :Tezro: :Fuel: x2+ :Octane2: :Octane: x3 :1600SW: x2 :O2: x2+ :Indigo2IMP: :Indigo2: x2 :Indigo: x3 :Indy: x2+

Once you step up to the big iron, you learn all about physics, electrical standards, and first aid - usually all in the same day

User avatar
theinonen
Posts: 380
Joined: Wed Feb 21, 2007 11:32 am
Location: Finland

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby theinonen » Mon May 05, 2008 12:30 pm

Here are results for Alphaserver 4100 5/400 (4x400 MHz EV56, 4 MB L3 cache / processor).
Operating system: Fedora Core
Compiler used: gcc-4.1

.
cat scene | ./c-ray-mt -t 32 > foo.ppm --> Rendering took: 3 seconds (3080 milliseconds).

cat sphfract | ./c-ray-mt -t 32 > foo.ppm --> Rendering took: 69 seconds (69983 milliseconds).

cat sphfract | ./c-ray-mt -t 128 -s 1024x768 -r 8 > foo.ppm --> Rendering took: 909 seconds (909746 milliseconds).

cat scene | ./c-ray-mt -t 128 -s 7500x3500 > foo.ppm --> Rendering took: 181 seconds (181711 milliseconds).

Not bad for something that is 12-years old.

User avatar
mapesdhs
Posts: 2516
Joined: Mon Nov 10, 2003 4:17 pm
Location: Edinburgh, Scotland
Contact:

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby mapesdhs » Mon May 05, 2008 12:53 pm

theinonen writes:
> Here are results for Alphaserver 4100 5/400 (4x400 MHz EV56, 4 MB L3 cache / processor).

Thanks!! What are the results for running with just 1 thread though? ie. only one CPU?


> Not bad for something that is 12-years old.

Check the quad-R10K/195 Onyx. ;)


I was hoping to try out my POWER Challenge this week (24 x R10K/195), but I'd forgotten the system
doesn't have an eBus board atm. Need to sort that first...

Ian.

User avatar
Dr. Dave
Posts: 2311
Joined: Fri Feb 13, 2004 10:37 pm
Location: Ottawa, Canada >burp<

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby Dr. Dave » Mon May 05, 2008 7:42 pm

Dr. Dave wrote:a 400 overcklocked to 481.25 MHz, on a SysAD of 87.5 Mhz, with cache clocked at 240.625 Mhz (cache divisor of 2, not 1.5).

Here's the hinv, note the speed is *not* reported correctly:

Code: Select all

Zaphod 6% hinv -vv
1 500 MHZ IP30 Processor
Heart ASIC: Revision F
CPU: MIPS R12000 Processor Chip Revision: 3.5
FPU: MIPS R12010 Floating Point Chip Revision: 0.0
Main memory size: 1280 Mbytes
Xbow ASIC: Revision 1.3
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 2 Mbytes
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
  Disk drive: unit 1 on SCSI controller 0 (unit 1)
  Disk drive: unit 2 on SCSI controller 0 (unit 2)
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
  CDROM: unit 6 on SCSI controller 1
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3 parallel port: plp1
Graphics board: ESI with texture option
Integral Fast Ethernet: ef0, version 1, pci 2
Gigabit Ethernet: eg0, PCI slot 1, firmware version 0.0.0
Iris Audio Processor: version RAD revision 12.0, number 1
  PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 2
  PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 0
  PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 1
  PCI Adapter ID (vendor 0x10a9, device 0x0005) PCI slot 3
  PCI Adapter ID (vendor 0x10a9, device 0x0009) PCI slot 1
Personal Video: unit 1, revision 1.0
Zaphod 7%

And here's the results (sans the really big test)

Code: Select all

Zaphod 3% cat scene | ./c-ray-f > foo.ppm
Rendering took: 2 seconds (2155 milliseconds)

Zaphod 4% cat sphfract | ./c-ray-f > foo.ppm
Rendering took: 58 seconds (58173 milliseconds)

Zaphod 5% cat sphfract | ./c-ray-f -s 1024x768 -r 8 > foo.ppm
Rendering took: 767 seconds (767399 milliseconds)

Note that this was done with the downloaded binary, as it's an Octane R12k, so I didn't figure that there was much advantage to recompiling it.

One other fun fact: The CD-ROM drive is actually a Pioneer slot-load SCSI DVD drive, a DVD-304S I seem to remember. Works fine. Have not tried audio-over-SCSI, but I was able to do the OS upgrade from it no problems.
:O3000: <> :O3000: :O2000: :Tezro: :Fuel: x2+ :Octane2: :Octane: x3 :1600SW: x2 :O2: x2+ :Indigo2IMP: :Indigo2: x2 :Indigo: x3 :Indy: x2+

Once you step up to the big iron, you learn all about physics, electrical standards, and first aid - usually all in the same day

User avatar
Dr. Dave
Posts: 2311
Joined: Fri Feb 13, 2004 10:37 pm
Location: Ottawa, Canada >burp<

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby Dr. Dave » Mon May 05, 2008 8:34 pm

I saw the single-threaded Windows binary, so I ran that on my dual-core Penryn @ 4.1 GHz (6MB cache) on Win2k, note that task manager reported exactly 50% load for all tests so it's likely scaleable to two cores by dividing the times by 2. The binary is probably not optimised for this processor either, but here are the results:

Code: Select all

P:\c-ray-11>x86\c-ray-f -i scene -o foo2.ppm
Rendering took: 0 seconds (469 milliseconds)

P:\c-ray-11>x86\c-ray-f -i sphfract -o foo2.ppm
Rendering took: 14 seconds (14219 milliseconds)

P:\c-ray-11>x86\c-ray-f -s 1024x768 -r 8 -i sphfract -o foo2.ppm
Rendering took: 186 seconds (186562 milliseconds)

P:\c-ray-11>x86\c-ray-f -s 7500x3500 -i scene -o foo2.pmm
Rendering took: 26 seconds (26984 milliseconds)

Dividing by two, gives an 800x600 'scene' time of about 235 as a point of reference, for 2 cores.
:O3000: <> :O3000: :O2000: :Tezro: :Fuel: x2+ :Octane2: :Octane: x3 :1600SW: x2 :O2: x2+ :Indigo2IMP: :Indigo2: x2 :Indigo: x3 :Indy: x2+

Once you step up to the big iron, you learn all about physics, electrical standards, and first aid - usually all in the same day

User avatar
mapesdhs
Posts: 2516
Joined: Mon Nov 10, 2003 4:17 pm
Location: Edinburgh, Scotland
Contact:

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby mapesdhs » Tue May 06, 2008 2:24 am

Dr. Dave writes:
> Here's the hinv, note the speed is *not* reported correctly:

Thanks! Does the 'system' command in Command Monitor report the correct speed? Just curious.


> And here's the results (sans the really big test)

You've actually already run the longest test (sphract at 1024x768 wih 8X). So what's the result
for the scene file at high-res?

The results nestle the system right inbetween an R12K/400 and dual-R10K/250, though it beat
the dual-250 for the other tests.


> Note that this was done with the downloaded binary, as it's an Octane R12k, so I didn't figure
> that there was much advantage to recompiling it.

Yes, probably only a minor difference. On some systems, I find c-ray-mt to be faster for the
single-threaded test, it varies.


> I saw the single-threaded Windows binary, so I ran that on my dual-core Penryn @ 4.1 GHz
> (6MB cache) ...

Sweet! I assume you mean it's a Core2Duo, yes? What was the original normal speed? ie. model number?


> ... on Win2k, ...

Yikes, XP would be a lot better you know. I didn't like XP much on 1st release (all the damn eye candy), but
the speed improvements over Win2K for multi-threaded & dual-core tasks are considerable (as much as 30%,
at least for 3D apps with exactly the same hardware anyway).


> ... note that task manager reported exactly 50% load for all tests ...

Heh, that's pretty wierd. Really ought to be 100% for this test as it's not accessing main RAM. Mind you, who
knows how accurate Task Manager is.


> ... so it's likely scaleable to two cores by dividing the times by 2. ...

I won't include any guesstimate results, it's too misleading. Got a spare disk you could stick Linux on and
recompile? I might try this with my 6000+ AMD.


> The binary is probably not optimised for this processor either, but here are the results:

Yes, it was compiled by the C-Ray author for his dual-core PentiumD 3GHz.

Thanks!!

Ian.

User avatar
theinonen
Posts: 380
Joined: Wed Feb 21, 2007 11:32 am
Location: Finland

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby theinonen » Tue May 06, 2008 8:17 am

mapesdhs wrote:theinonen writes:
> Here are results for Alphaserver 4100 5/400 (4x400 MHz EV56, 4 MB L3 cache / processor).

Thanks!! What are the results for running with just 1 thread though? ie. only one CPU?

Ian.


Ok, here are results for only 1 thread.

Test 1: 'scene' at 800x600 --> 12 seconds (12084 milliseconds).
Test 2: 'sphract' at 800x600 --> 281 seconds (281397 milliseconds).
Test 3: 'sphract' at 1024x768 with 8X oversampling --> Rendering took: 3616 seconds (3616032 milliseconds).
Test 4: 'scene' at 7500x3500 --> Rendering took: 723 seconds (723813 milliseconds).


On Alphastation 500/500 (500 MHz EV56, 8 MB L3 cache, Debian 4.0, gcc-4.1).

Results are:

Test 1: 'scene' at 800x600 --> Rendering took: 10 seconds (10941 milliseconds)
Test 2: 'sphract' at 800x600 --> Rendering took: 242 seconds (242768 milliseconds)
Test 3: 'sphract' at 1024x768 with 8X oversampling --> Rendering took: 3187 seconds (3187036 milliseconds).
Test 4: 'scene' at 7500x3500 --> Rendering took: 667 seconds (667997 milliseconds)

User avatar
mapesdhs
Posts: 2516
Joined: Mon Nov 10, 2003 4:17 pm
Location: Edinburgh, Scotland
Contact:

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby mapesdhs » Tue May 06, 2008 9:33 am

theinonen writes:
> Ok, here are results for only 1 thread.
> <etc>
> On Alphastation 500/500 (500 MHz EV56, 8 MB L3 cache, Debian 4.0, gcc-4.1).
> <etc>

Thanks again! Hmm, not as fast as I would have expected. A compiler issue perhaps?

Ian.

User avatar
theinonen
Posts: 380
Joined: Wed Feb 21, 2007 11:32 am
Location: Finland

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby theinonen » Tue May 06, 2008 1:04 pm


Thanks again! Hmm, not as fast as I would have expected. A compiler issue perhaps?

Ian.



...Perhaps.

I tried that blender benchmark with AS4100 to see if the results are any similar, and it seemed to do a lot better on that. On Blender 2.45 using 8 threads, it finished with time: 06:36.54.

About 45 seconds faster than a dual-300MHz Octane2 in your tests.
In c-ray benchmarks that "same" octane2 beats that AS4100 very easily.

I was surprised to see that using 8 threads was faster than 4 threads. Would have thought, that using the same number of threads than processors in the system would be optimum.

User avatar
mapesdhs
Posts: 2516
Joined: Mon Nov 10, 2003 4:17 pm
Location: Edinburgh, Scotland
Contact:

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby mapesdhs » Tue May 06, 2008 3:45 pm

theinonen writes:
> I tried that blender benchmark with AS4100 to see if the results are any similar, and it seemed to do a lot
> better on that. On Blender 2.45 using 8 threads, it finished with time: 06:36.54.

One other possibility: the 21164 has a small L1 - maybe this hurts for a test like C-ray that normally resides
almost entirely within L1/L2?


> I was surprised to see that using 8 threads was faster than 4 threads. Would have thought, that using the
> same number of threads than processors in the system would be optimum.

I think it's partly because Blender's threading isn't that good (IMO) and partly because, for 4 threads,
they would never finish at exactly the same time, so towards the end only 3 threads will be active, then
2, then 1, ie. the parallelism drops off badly, and if the final thread happens to be a more complex part
of the scene...

I posted about this on the Blender forum, but maybe changing how Blender uses multiple CPUs/cores
would be too complex a shift, though I think my suggestion would help.

Ian.

SAQ
Posts: 5871
Joined: Wed Jul 19, 2006 8:37 am
Location: Renton, WA

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby SAQ » Tue May 06, 2008 8:48 pm

mapesdhs wrote:theinonen writes:
> Ok, here are results for only 1 thread.
> <etc>
> On Alphastation 500/500 (500 MHz EV56, 8 MB L3 cache, Debian 4.0, gcc-4.1).
> <etc>

Thanks again! Hmm, not as fast as I would have expected. A compiler issue perhaps?

Ian.


http://h30097.www3.hp.com/linux/compaq_c/index.html

At the time they were quoting speed improvements over GCC of 15%-200% depending on code mix.

It's free.
"Brakes??? What Brakes???"

"I am O SH-- the Great and Powerful"

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O3x0: :ChallengeL: :O2000R: (single-CM)

User avatar
mapesdhs
Posts: 2516
Joined: Mon Nov 10, 2003 4:17 pm
Location: Edinburgh, Scotland
Contact:

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby mapesdhs » Wed May 07, 2008 2:59 am

SAQ writes:
> At the time they were quoting speed improvements over GCC of 15%-200% depending on code mix.

That sounds logical, otherwise atm the table has an Alpha/400 being beaten by an R5000PC/180 O2. :|


> It's free.

If only SGI would do the same with MIPS Pro... *sigh*

Ian.

User avatar
Dr. Dave
Posts: 2311
Joined: Fri Feb 13, 2004 10:37 pm
Location: Ottawa, Canada >burp<

Re: C-Ray FP/CPU Benchmark Test Results

Unread postby Dr. Dave » Fri May 09, 2008 9:09 pm

Last result for the Octane:

Code: Select all

Zaphod 7% cat scene | c-ray-f -s 7500x3500 -i scene -o foo2.pmm
Rendering took: 130 seconds (130582 milliseconds)
:O3000: <> :O3000: :O2000: :Tezro: :Fuel: x2+ :Octane2: :Octane: x3 :1600SW: x2 :O2: x2+ :Indigo2IMP: :Indigo2: x2 :Indigo: x3 :Indy: x2+

Once you step up to the big iron, you learn all about physics, electrical standards, and first aid - usually all in the same day


Return to “SGI: Hardware”

Who is online

Users browsing this forum: Bing [Bot] and 1 guest