Strange code execution speed.

IRIX/Nekoware development, porting and related topics.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
User avatar
artherd
Posts: 108
Joined: Fri Sep 03, 2004 11:45 pm
Location: SF Bay Area, CA
Contact:

Strange code execution speed.

Unread postby artherd » Mon Nov 22, 2004 5:59 pm

This code is executing at about 1/100th of the speed I think it should be executing at, on an Onyx2 running 6.5.26 (with 7.3 mipspro compilers)

Any ideas why? Very very strange. (note, I am not a programmer, sorry :D)

Code: Select all

colour 5# vi NUC_LoopTest.cpp
#include "stdio.h"
#include "stdlib.h"
#include "string.h"

int pixel[640][512];
float gain[640][512];
float offset[640][512];
int N=0;

int i,j,k;

int main(int argc, char* argv[])
{
        N=1000;
        for (j=0;j<640; j++) {
                for (k=0;k<512;k++)  {
                        pixel[j][k] = 312;
                        gain[j][k] = (float)2.345;
                        offset[j][k] = (float)25.456;
                }
        printf(" %d structures initialized\n",N);
        }
        printf(" %d loops started\n",N);
        for (i=0; i<N; i++)  {
                printf(" %d loops completed\n",i);
                for (j=0;j<640; j++) {
                        for (k=0;k<512;k++)  {
                                pixel[j][k] = (int)(gain[j][k]*(float)pixel[j][k] + offset[j][k]);
                        }
                }
        }
        printf("loops done");
        return (0);
}
My first Indy is still my favourite SGI.
CDglobal Networks: http://www.cdglobal.net/

User avatar
dexter1
Moderator
Moderator
Posts: 2062
Joined: Thu Feb 20, 2003 6:57 am
Location: Voorburg, The Netherlands
Contact:

Unread postby dexter1 » Tue Nov 23, 2004 12:06 am

I have an idea. I noticed that bringing N down to 10 and stripping all the printf's (extremely bad for loop optimisation!) it really flies:

Code: Select all

irene /tmp> cc -n32 -Ofast=ip28 -r10000 -mips4 -o loop loop.c
irene /tmp> time loop
pixel 0,0  195fe0 pixel xmax,ymax 195fe0
0.127u 0.061s 0:00.27 66.6% 0+0k 0+0io 0pf+0w

I have added those pixel printf's at the end, otherwise the compiler decides that nothing get's done with the result, so it might optimise the whole loop, by throwing it away. heheh

Good, now increase N by 2 and watch what happens:

Code: Select all

N=10
pixel 0,0  195fe0 pixel xmax,ymax 195fe0
0.127u 0.061s 0:00.27 66.6% 0+0k 0+0io 0pf+0w

N=12
pixel 0,0  8b894f pixel xmax,ymax 8b894f
0.143u 0.061s 0:00.29 68.9% 0+0k 0+0io 0pf+0w

N=14
pixel 0,0  2ff50b4 pixel xmax,ymax 2ff50b4
0.160u 0.064s 0:00.29 75.8% 0+0k 0+0io 0pf+0w

N=16
pixel 0,0  107b7cc0 pixel xmax,ymax 107b7cc0
0.177u 0.064s 0:00.37 62.1% 0+0k 0+0io 0pf+0w

N=18
pixel 0,0  5aa31180 pixel xmax,ymax 5aa31180
0.193u 0.065s 0:00.45 55.5% 0+0k 0+0io 0pf+0w

N=20
pixel 0,0  7fffffff pixel xmax,ymax 7fffffff
1.189u 3.528s 0:06.67 70.4% 0+0k 0+0io 0pf+0w


Kablam! Integer overflow. That will cost you deerly. Now if you change the pixel array to float instead of int and remove the casts, it performs much better:

Code: Select all

N=20
pixel 0,0  8367358464.000000 pixel xmax,ymax 8367358464.000000
0.083u 0.063s 0:00.20 70.0% 0+0k 0+0io 0pf+0w

You still have to do something about your results going to infinity and apply some sort of so-called clamping of the pixel array, cause you only want a restricted range of colors to display, right?

:P This advice ofcourse doesn't come cheap :P I'll expect that Onyx2 on my doorstep tomorrow, okay :P

User avatar
artherd
Posts: 108
Joined: Fri Sep 03, 2004 11:45 pm
Location: SF Bay Area, CA
Contact:

Unread postby artherd » Tue Nov 23, 2004 11:34 pm

Dexter: She's in the mail, hope your mailbox can hold 400lbs! ;P

Here's what we're running now, this is running about as expected. Actually a bit better than expected :D) Single R10k-250 is posting 104sec execution time, which equals the p4-2.4gig (windows though) system. I didn't expect to do that good even on mult*add.

Here's our compile options, hopefully they're not optomizing out all the math :)

Code: Select all

cc NUC_LoopTest.cpp -o NUC_LoopTest-OLD -n32 -Ofast=ip27 -r10000 -mips4


Code: Select all

#include "stdio.h"
#include "stdlib.h"
#include "string.h"

int pixel[640][512];
int pixelout[640][512];
float gain[640][512];
float offset[640][512];
long N=0;

long i,j,k;

int main(int argc, char* argv[])
{
        N=10000;
        for (j=0;j<640; j++) {
                for (k=0;k<512;k++)  {
                        pixel[j][k] = 312;
                        gain[j][k] = (float)2.345;
                        offset[j][k] = (float)25.456;
                }
        printf(" %d structures initialized\n",N);
        }
        printf(" %d loops started\n",N);
        for (i=0; i<N; i++)  {
//              printf(" %d loops completed\n",i);
                for (j=0;j<640; j++) {
                        for (k=0;k<512;k++)  {
                                pixelout[j][k] = (int)(gain[j][k]*(float)pixel[j][k] + offset[j][k]);
                        }
                }
        }
        printf("loops done\n");
                printf("pixelout %d\n",pixelout[1][1]*i);

        return (0);
}
// EOF


time ./NUC_LoopTest returns:

Code: Select all

104.083u 0.122s 1:44.91 99.3% 0+0k 0+0io 0pf+0w
My first Indy is still my favourite SGI.

CDglobal Networks: http://www.cdglobal.net/

User avatar
artherd
Posts: 108
Joined: Fri Sep 03, 2004 11:45 pm
Location: SF Bay Area, CA
Contact:

Unread postby artherd » Tue Nov 23, 2004 11:37 pm

Hrm, something is wrong with the system time. Looks like we're still optomizing out the loop, maybe?
My first Indy is still my favourite SGI.

CDglobal Networks: http://www.cdglobal.net/

User avatar
SkyBound
Posts: 136
Joined: Tue Jan 13, 2004 10:57 am
Location: Enschede, The Netherlands
Contact:

Unread postby SkyBound » Wed Nov 24, 2004 6:37 am

artherd wrote:Hrm, something is wrong with the system time. Looks like we're still optomizing out the loop, maybe?


Just a few tricks:

1. Use register long i,j,k;
2. Use two temporary float's for (float)2.345 and (float)25.456 instead (preferably declared
as consts) and refer to those variables.
3. Count from MAX down to 0 instead
(By default CPU's can test a regester agains 0 in one clock cycle).


Erik


Return to “SGI: Development”

Who is online

Users browsing this forum: linkdex [Bot] and 1 guest