Efficiency - PowerPC G4 vs. G5

Apple hardware/software and related topics.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
User avatar
johnnym
Donor
Donor
Posts: 229
Joined: Sun Sep 04, 2016 9:53 pm

Re: Efficiency - PowerPC G4 vs. G5

Unread postby johnnym » Sun Apr 01, 2018 5:05 am

These are results for my first Power Macintosh G5 (type 11,2) with a single PPC970MP running at 2.0 GHz:

Code: Select all

root@powermac-g5:~# uname -a
Linux powermac-g5 4.15.0-2-powerpc64 #1 SMP Debian 4.15.11-1 (2018-03-20) ppc64 GNU/Linux

root@powermac-g5:~# lscpu
Architecture:        ppc64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Big Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           2
NUMA node(s):        1
Model:               1.1 (pvr 0044 0101)
Model name:          PPC970MP, altivec supported
CPU max MHz:         2000.0000
CPU min MHz:         1000.0000
L1d cache:           32K
L1i cache:           64K
L2 cache:            1024K
NUMA node0 CPU(s):   0,1

root@powermac-g5:~# cat /proc/cpuinfo
processor   : 0
cpu      : PPC970MP, altivec supported
clock      : 2000.000000MHz
revision   : 1.1 (pvr 0044 0101)

processor   : 1
cpu      : PPC970MP, altivec supported
clock      : 2000.000000MHz
revision   : 1.1 (pvr 0044 0101)

timebase   : 33333333
platform   : PowerMac
model      : PowerMac11,2
machine      : PowerMac11,2
motherboard   : PowerMac11,2 MacRISC4 Power Macintosh
detected as   : 337 (PowerMac G5 Dual Core)
pmac flags   : 00000000
L2 cache   : 1024K unified
pmac-generation   : NewWorld

root@powermac-g5:~# time openssl speed -elapsed
[...]
OpenSSL 1.1.0h  27 Mar 2018
built on: reproducible build, date unspecified
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DVPAES_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/powerpc64-linux-gnu/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00
mdc2                 0.00         0.00         0.00         0.00         0.00         0.00
md4              12278.14k    42460.86k   113456.04k   195655.34k   248116.57k   252968.96k
md5              23063.62k    59533.29k   113250.82k   146231.64k   159831.38k   160869.03k
hmac(md5)         7935.54k    26623.02k    71371.43k   122953.05k   155803.65k   158826.50k
sha1             26640.19k    75849.47k   151569.58k   203967.83k   226751.83k   228769.79k
rmd160            7515.63k    20340.29k    40697.86k    54239.23k    59866.92k    60506.11k
rc4             184421.49k   215374.12k   225875.37k   228349.95k   229348.69k   229064.70k
des cbc          32292.19k    33522.50k    33840.64k    33918.98k    33939.46k    33914.88k
des ede3         12303.78k    12474.07k    12537.09k    12552.87k    12555.61k    12555.61k
idea cbc             0.00         0.00         0.00         0.00         0.00         0.00
seed cbc         35598.94k    37584.45k    38336.94k    38533.80k    38316.71k    38453.25k
rc2 cbc          15804.70k    16177.69k    16272.21k    16290.13k    16302.08k    16302.08k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00
blowfish cbc     54261.26k    57810.62k    58918.91k    59201.88k    59285.50k    59255.47k
cast cbc         48982.18k    52801.47k    53954.22k    54255.27k    54337.54k    54345.73k
aes-128 cbc      46859.48k    50375.62k    52073.22k    52647.25k    52830.21k    52789.25k
aes-192 cbc      40370.53k    42431.49k    43780.27k    44310.53k    44370.60k    44411.56k
aes-256 cbc      35191.59k    37038.61k    37932.71k    38140.25k    38185.64k    38289.41k
camellia-128 cbc    48385.40k    55005.35k    58642.09k    59697.49k    58952.36k    59550.38k
camellia-192 cbc    42731.82k    45184.32k    46435.58k    46753.79k    46063.62k    46415.87k
camellia-256 cbc    42501.50k    45220.67k    46445.82k    46724.10k    45981.70k    46432.26k
sha256           16594.78k    41033.24k    74578.86k    95311.87k   103667.03k   104322.39k
sha512           14079.55k    58126.34k    92774.23k   136303.27k   157985.45k   160246.44k
whirlpool         6569.01k    14305.39k    24023.89k    29045.76k    30924.80k    31178.75k
aes-128 ige      48581.36k    50770.01k    51948.03k    52238.34k    51817.13k    51980.97k
aes-192 ige      41745.20k    42952.81k    43759.79k    43981.48k    43616.94k    43750.74k
aes-256 ige      36285.96k    37240.68k    37667.58k    37713.58k    37612.20k    37688.66k
ghash            88885.26k    95958.59k    99713.54k   100591.27k   100854.44k   100810.75k
[...]
real   18m56.531s
user   18m54.353s
sys   0m1.476s

root@powermac-g5:~# time 7z b -mmt1

7-Zip 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs BE)

BE
CPU Freq:   989   988   990   990   990   990   990   990

RAM size:    8910 MB,  # CPU hardware threads:   2
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1085   100   1057   1056  |      18729   100   1599   1599
23:       1043   100   1064   1063  |      18348   100   1588   1588
24:        978   100   1052   1052  |      17860   100   1568   1568
25:        946   100   1080   1080  |      17199   100   1532   1531
----------------------------------  | ------------------------------
Avr:             100   1063   1063  |              100   1572   1572
Tot:             100   1318   1317

real   1m15.075s
user   1m13.834s
sys   0m1.069s

root@powermac-g5:~# time 7z b

7-Zip 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs BE)

BE
CPU Freq:   989   990   991   990   990   990   990   990

RAM size:    8910 MB,  # CPU hardware threads:   2
RAM usage:    441 MB,  # Benchmark threads:      2

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       2137   191   1088   2079  |      37263   200   1593   3182
23:       2033   195   1062   2072  |      36570   200   1584   3166
24:       1936   198   1054   2082  |      35562   200   1562   3122
25:       1881   198   1085   2148  |      34163   200   1522   3041
----------------------------------  | ------------------------------
Avr:             195   1072   2095  |              200   1565   3127
Tot:             198   1319   2611

real   0m43.701s
user   1m21.739s
sys   0m1.351s

root@powermac-g5:~/git-projects/STREAM# time ./stream_c.exe
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 28012 microseconds.
   (= 28012 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            3286.7     0.048758     0.048681     0.048894
Scale:           3275.7     0.049067     0.048845     0.050449
Add:             3354.9     0.071603     0.071537     0.071702
Triad:           3354.1     0.071637     0.071554     0.071974
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

real   0m2.694s
user   0m5.014s
sys   0m0.224s

root@powermac-g5:~/git-projects/STREAM# export OMP_NUM_THREADS=1
root@powermac-g5:~/git-projects/STREAM# time ./stream_c.exe
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 38841 microseconds.
   (= 38841 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            2554.8     0.062669     0.062628     0.062719
Scale:           2509.4     0.063841     0.063759     0.063970
Add:             2955.7     0.081229     0.081198     0.081276
Triad:           2945.5     0.081529     0.081481     0.081575
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

real   0m3.256s
user   0m3.080s
sys   0m0.166s

Compression speeds are similar or better on the type 11,2 than on the type 7,3 despite it's actually clocked slower. Decompression is slower on the type 11,2 though. The memory bandwidth of the type 11,2 seems to be about 1.5 times the memory bandwidth of the type 7,3 Power Macintosh G5 (DDR2 against DDR1 and maybe an improved memory controller). The variance of the STREAM results is also much smaller on the type 11,2, check the box and whisker plot of 500 consecutive STREAM runs on both cores of the type 11,2:
powermac-g5-11-2-2-ghz-plot.jpg
:Indigo: :Indy: :O2: :Octane: :Octane2: :O200: = :O200: - :O200: = :O200: (O200 cluster w/2 GIGAchannel cabinets)
[ ( hp ) ] 712/80 c3000 (dead) :hpserv: (J5600) c3700 c3750 c8000 rp2470 rp3440 :rx2600: (rx2620) rx2660 rx4640
| d | i | g | i | t | a | l | AXPpci33 AlphaStation 200 AlphaStation 255 PWS 500au AlphaServer DS20E AlphaServer DS25
C O B A L T Qube 2 Qube 3 RaQ RaQ 2 RaQ 4r RaQ XTR

Shiunbird
Donor
Donor
Posts: 497
Joined: Fri May 06, 2016 1:43 pm
Location: Czech Republic

Re: Efficiency - PowerPC G4 vs. G5

Unread postby Shiunbird » Tue Apr 03, 2018 12:45 am

Neat!

I'm going to work on this once I'm back from abroad.
ImageImage


Return to “Apple”

Who is online

Users browsing this forum: No registered users and 4 guests