using FPU for greater performance?

IRIX/Nekoware development, porting and related topics.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
bigendian
Posts: 35
Joined: Wed Jul 30, 2003 10:06 pm
Location: Winter Park, FL
Contact:

using FPU for greater performance?

Unread postby bigendian » Sun Feb 29, 2004 9:56 am

When I first started coding, it was on a Mac Quadra 650. At that point in time, if you wanted your code to run quickly on other machines you had to avoid using the FPU like the plague because not everybody HAD an FPU. A few friends of mine that did stuff with DOS on x86 had the same problems. Now, I look at the massive integer performance of x86 and its relatively anemic FP by comparison, and I assume that the same ideas still hold true in that world.

My question comes from the fact taht the R10k/R12k isn't that great on INT performance but has MASSIVE FP performance. How difficult would it be to modify mplayer/mencoder to use the FPU instead of the INT math unit? Is it just that there are so many bitwise operations?

daniel

User avatar
ShadeOfBlue
Moderator
Moderator
Posts: 799
Joined: Tue Nov 25, 2003 12:09 pm
Location: Europe

Unread postby ShadeOfBlue » Sun Feb 29, 2004 10:34 am

Some codecs are written in a way that you must use bitwise operations to encode/decode them...

Since SGI's CPUs (at least the newer ones [R4K and up]) have 32 integer and 32 floating-point registers it would be better to modify the code to use those.
Crappy x86 processors have much less registers, so people don't use them [the registers, not processors (sadly)], they rather recode the stuff in assembler or just leave the code as it is, which is bad...

I've done some modifications to the MPlayer 1.0-pre3 code. I've tried to optimize it for SGIs.
First by converting commonly used variables to register variables (which was quite easy though it took a long time, just put "register" before the variable's type, ex. "register int i;", just make sure that the address of the variable isn't requested (the compiler complains in this case), also you can't make global variables register).
This made a visible speed increase on my R10k-175 O2, but it's still to slow to play most DivX movies (even with -hardframedrop and friends).
If someone with MipsPRO compilers is interested in the "fixed" code, I can send it to him/her to compile with proper optimizations.
I only have gcc, which sucks at optimizing stuff for MIPS...

Using the ICE (Image Compression Engine) on the O2 (if only for colorspace conversions) would also improve speed.

But perhaps all these things are not worth the time, since we have IRIXDivX.
When it gets support for more codecs it will replace mplayer on SGIs but until then mplayer has support for more codecs...

Recoding mplayer's core to use FP instead of INT functions would be very time consuming, hard and the person doing this would need to have a lot of knowledge how these things work... (though, some codecs already use FP stuff)

As for the FPU, all modern computers have a FPU, so you definetly shouldn't avoid to use it ;)

P.S.: Hope this post made sense...

jdboyd
Posts: 562
Joined: Thu Aug 21, 2003 11:47 am
Location: Southern PA

Unread postby jdboyd » Mon Mar 01, 2004 11:52 am

If someone with MipsPRO compilers is interested in the "fixed" code, I can send it to him/her to compile with proper optimizations.


I have an account on a machine with MipsPRO. I don't have a lot of time though, so don't expect quick results.

Nor am I very familiar with how to best compile things for maximum performance.

How much of your optimizations were in Mplayer, versus ffmpeg?

Finally, if you are doing this sort of work, have you tried getting SGI Developer Plus membership? My understanding is that gets you a compiler license.

But perhaps all these things are not worth the time, since we have IRIXDivX.


Some of us may prefer using libre software.

Plus, if I'm not mistaken, IRIXDiVX doesn't do DeCSS, so one still can't actually watch DVDs with it (but then, most people don't have DVD drives on our SGIs anyway).

schleusel
Posts: 495
Joined: Mon Oct 20, 2003 6:49 am
Location: NRW, Germany
Contact:

Unread postby schleusel » Mon Mar 01, 2004 12:44 pm

ShadeOfBlue wrote:I've done some modifications to the MPlayer 1.0-pre3 code. I've tried to optimize it for SGIs.
First by converting commonly used variables to register variables (which was quite easy though it took a long time, just put "register" before the variable's type, ex. "register int i;", just make sure that the address of the variable isn't requested (the compiler complains in this case), also you can't make global variables register).
This made a visible speed increase on my R10k-175 O2, but it's still to slow to play most DivX movies (even with -hardframedrop and friends).
If someone with MipsPRO compilers is interested in the "fixed" code, I can send it to him/her to compile with proper optimizations.
I only have gcc, which sucks at optimizing stuff for MIPS...

Great! I'd love to test those changes. I already did a MipsPro build of 1.0pre3 (see this thread:http://forums.nekochan.net/viewtopic.php?t=1374), which turned out to be quite a huge GCCism cleaning orgy *sigh*. Anyway, I was planning to play a bit with it in the following weeks again (and hopefully release an 1.0pre4 package then), as I got some time now. So your patch would be most welcome ;-)
ShadeOfBlue wrote:Using the ICE (Image Compression Engine) on the O2 (if only for colorspace conversions) would also improve speed.

Getting the YUV to RGB conversion off the CPU would lead to the biggest speed improvement I guess. I think this could be implemented in hardware using the open gl imaging extensions (glColorMatrix), I'm lacking the skills to do this myself though..
The gl and gl2 plugins have a new maintainer now, who is quite a nice guy. He helped me to sort out certain problems with the gl2 plugin in the past already. There already was a short thread about the idea of gpu yuv conversion for the gl plugins on the mailinglist lately - so if anybody in here feels the urge to implement this.. ;-)

ShadeOfBlue wrote:But perhaps all these things are not worth the time, since we have IRIXDivX.
When it gets support for more codecs it will replace mplayer on SGIs but until then mplayer has support for more codecs...

Yes, Brandon does a great job on IRIXdivx but Its always nice to have choice. And with IRIXdivx there seem to be great differences in what is done in hardware on the different platforms. On MGRAS (EMXI) based Octane mplayer was actually way faster for most stuff I tested - using hardware scaling (gl2) of course, for people without TRAM IRIXdivx might be the better performer on MGRAS too. I recently switched to Odyssey and I was really impressed by the performance jump of IRIXdivx. It seems to do a lot more in hardware there..

so long,
Timo

User avatar
ShadeOfBlue
Moderator
Moderator
Posts: 799
Joined: Tue Nov 25, 2003 12:09 pm
Location: Europe

Unread postby ShadeOfBlue » Mon Mar 01, 2004 1:50 pm

schleusel wrote:Great! I'd love to test those changes. I already did a MipsPro build of 1.0pre3 (see this thread:http://forums.nekochan.net/viewtopic.php?t=1374), which turned out to be quite a huge GCCism cleaning orgy *sigh*.


I can imagine that... Full of C++ comments in C code, etc...
I really admire people like you that take the time to correct such things and release binaries for people that don't have the compilers.

Anyway, I was planning to play a bit with it in the following weeks again (and hopefully release an 1.0pre4 package then), as I got some time now. So your patch would be most welcome ;-)


OK, I'll make a patch soon, probably tomorrow or until the end of this week. :)

Getting the YUV to RGB conversion off the CPU would lead to the biggest speed improvement I guess. I think this could be implemented in hardware using the open gl imaging extensions (glColorMatrix), I'm lacking the skills to do this myself though..
The gl and gl2 plugins have a new maintainer now, who is quite a nice guy. He helped me to sort out certain problems with the gl2 plugin in the past already. There already was a short thread about the idea of gpu yuv conversion for the gl plugins on the mailinglist lately - so if anybody in here feels the urge to implement this.. ;-)


I also lack the skills to do this... Haven't done much in OpenGL so far, though if I have more time, I'll look into this.

Yes, Brandon does a great job on IRIXdivx but Its always nice to have choice. And with IRIXdivx there seem to be great differences in what is done in hardware on the different platforms. On MGRAS (EMXI) based Octane mplayer was actually way faster for most stuff I tested - using hardware scaling (gl2) of course, for people without TRAM IRIXdivx might be the better performer on MGRAS too. I recently switched to Odyssey and I was really impressed by the performance jump of IRIXdivx. It seems to do a lot more in hardware there..


I see. Well, I'm sure that IRIXdivx will get faster, it's still under development, after all.



jdboyd wrote:How much of your optimizations were in Mplayer, versus ffmpeg?

Most of them were in ffmpeg (libavcodec) and mp3lib. Optimizations in mp3lib were the most visible, they made the picture move more smoothly (also a bit less lag on audio/video sync), though the ones in ffmpeg were also quite visible.
However, a lot of things still remain to be optimized... I didn't have the time to check every file for things that could be optimized (however, I intend to do that when I'll have some more time).

Finally, if you are doing this sort of work, have you tried getting SGI Developer Plus membership? My understanding is that gets you a compiler license.

These things are only a hobby to me, I don't make any money from them.
I currently have a Developer Online membership, which I hope to upgrade to Developer Plus after I make more useful apps to show to them (since they ask you about what projects you have so that they can see if you're worth the Plus membership ;) )
I don't know if it's even possible for someone like me to get a Plus membership...

bigendian
Posts: 35
Joined: Wed Jul 30, 2003 10:06 pm
Location: Winter Park, FL
Contact:

Unread postby bigendian » Mon Mar 01, 2004 3:31 pm

I have MIPSpro compilers here, I've got a TRAM module on the way for my SI head. I'd love to get some decent performance out of both IRIXDivx and Mplayer.

I'd like to test MPlayer on my octane once you guys get your patches together.

daniel

User avatar
Scott Tarr
Posts: 120
Joined: Sat Jul 26, 2003 1:08 pm
Location: Detroit Metro area

Unread postby Scott Tarr » Tue Mar 02, 2004 7:45 pm

ShadeOfBlue states:
Since SGI's CPUs (at least the newer ones [R4K and up]) have 32 integer and 32 floating-point registers it would be better to modify the code to use those.
Crappy x86 processors have much less registers, so people don't use them [the registers, not processors (sadly)], they rather recode the stuff in assembler or just leave the code as it is, which is bad...

Uh, x86 has had 32-bit registers since the 386. :violent1: :P

User avatar
squeen
Moderator
Moderator
Posts: 2933
Joined: Fri May 09, 2003 6:10 am
Location: Maryland, USA

Unread postby squeen » Wed Mar 03, 2004 3:39 am

Scott Tarr wrote:Uh, x86 has had 32-bit registers since the 386.


I think he meant fewer registers -- MIPS has quite a few. I found out the hard way that on x86 machines function call arguments are passed through the stack whereas MIPS pushes about the first half-dozen or so into it's registers.

User avatar
ShadeOfBlue
Moderator
Moderator
Posts: 799
Joined: Tue Nov 25, 2003 12:09 pm
Location: Europe

Unread postby ShadeOfBlue » Wed Mar 03, 2004 7:00 am

Scott Tarr wrote:ShadeOfBlue states:
Since SGI's CPUs (at least the newer ones [R4K and up]) have 32 integer and 32 floating-point registers it would be better to modify the code to use those.
Crappy x86 processors have much less registers, so people don't use them [the registers, not processors (sadly)], they rather recode the stuff in assembler or just leave the code as it is, which is bad...

Uh, x86 has had 32-bit registers since the 386. :violent1: :P


Ummm... I never said anything about bits, what I said was "[...] x86 processors have much less registers [...]", not bits. :P

User avatar
Scott Tarr
Posts: 120
Joined: Sat Jul 26, 2003 1:08 pm
Location: Detroit Metro area

Unread postby Scott Tarr » Wed Mar 03, 2004 8:39 pm

Yeah, after sleeping last night, I woke up to find the clarity of vision a bit better.

Blame it on having to remove one of the techs last week and my having to jump into a tech support role for a few days. :shock:

After a while, everything just looks wrong. :(

bigendian
Posts: 35
Joined: Wed Jul 30, 2003 10:06 pm
Location: Winter Park, FL
Contact:

TRAM is here, also APO license

Unread postby bigendian » Thu Mar 04, 2004 3:18 pm

I forgot to mention that I have an APO license on my octane so we can see how much of an improvement could be made by auto-parallelization of the MPlayer code.

daniel

User avatar
ShadeOfBlue
Moderator
Moderator
Posts: 799
Joined: Tue Nov 25, 2003 12:09 pm
Location: Europe

Re: TRAM is here, also APO license

Unread postby ShadeOfBlue » Sat Mar 06, 2004 10:37 am

bigendian wrote:I forgot to mention that I have an APO license on my octane so we can see how much of an improvement could be made by auto-parallelization of the MPlayer code.


Nice! If you want the patch, send me a PM with your e-mail and I'll send it to you.
Though, you'll have to fix the GCCisms that may be present in it (I haven't done any fixing, but hopefully that won't be necessary).
I've already sent it to schleusel. I hope you two won't have any problems applying it to the source and compiling it...

schleusel
Posts: 495
Joined: Mon Oct 20, 2003 6:49 am
Location: NRW, Germany
Contact:

Re: TRAM is here, also APO license

Unread postby schleusel » Sat Mar 06, 2004 2:57 pm

bigendian wrote:I forgot to mention that I have an APO license on my octane so we can see how much of an improvement could be made by auto-parallelization of the MPlayer code.


I'll include a patch of my mipspro changes when I update the package, so you can play with it. Sounds like an interesting project to me (although I have my doubts that APO can be very efficient on this bloody thing) :-)

ShadeOfBlue wrote:I've already sent it to schleusel. I hope you two won't have any problems applying it to the source and compiling it...

Thanks again :-) I started to play a bit with it yesterday. I did four builds: two gcc builds from the untouched 1.0pre3 source - one with and one without your patch and two mipspro builds with and without your patch. I tested with divx and mpeg2 files and gl2 and x11 plugins.

For divx the difference was small but measurable (around 3% faster overall on my r12k-400 Octane (I can supply the exact results if needed), with the difference coming sorely from the codec - display and audio CPU% didn't change), for mpeg2 decoded through ffmpeg12 the margin was comparable.
It might be worth to go through libmpeg2 too as this is the default decoder for mpeg1/2 and is a lot faster than ffmpeg12..

so long,
Timo

User avatar
ShadeOfBlue
Moderator
Moderator
Posts: 799
Joined: Tue Nov 25, 2003 12:09 pm
Location: Europe

Re: TRAM is here, also APO license

Unread postby ShadeOfBlue » Sun Mar 07, 2004 2:28 am

schleusel wrote:For divx the difference was small but measurable (around 3% faster overall on my r12k-400 Octane (I can supply the exact results if needed), with the difference coming sorely from the codec - display and audio CPU% didn't change), for mpeg2 decoded through ffmpeg12 the margin was comparable.
It might be worth to go through libmpeg2 too as this is the default decoder for mpeg1/2 and is a lot faster than ffmpeg12.


I see. I'll try to do that.
I hope that the MipsPro compile will be faster :)
I'll also try to find a way how to do colorspace conversions in hardware, so that should help, too...

EDIT: I've found the colorspace conversion functions (finally...). They are in the postproc/ directory. I've optimized them with register variables (this was somewhat noticeable, but still not enough) so I'll look into OpenGL documentation when I come home.


Return to “SGI: Development”

Who is online

Users browsing this forum: No registered users and 1 guest