Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
optimize with -O3 and -mtune

#1
I've compiled on Ubuntu 12.04 with -O3 -mtune=amdfam10 and making it RELEASE version by editing the makefiles. I was all over the place first, but now I start from the beginning and want to find out the best way to do this.

I'm not much of a programmer so I'm not completely sure an all the changes I need to do to build the most optimized version.

The most important file probably is makefile.inc . Adding -march=amdfam10 on the already existing CPUOPTIMIZATIONS should be OK?
After all it says
"##### Definitions shared by all makefiles #####
include makefile.inc"

With the "shared by all" mentioned above should any further changes be needed?
Reply

#2
There is no need here to add -mtune or -march. Since GCC 4.3 there is -march=native which detects the CPU type and builds accordingly. This is used by default when building Xonotic locally. There are a few select cases where -march=native does not yield the best result for example on athlonxp where athlon4 is wierdly almost always faster but for Phenom you will be fine.
I'm at least a reasonably tolerable person to be around - Narcopic
Reply

#3
Thanks. I must have missed reading about the native option.

I did not find any "native" in the makefiles, but I guess it does not need to be there? It goes native unless you tell it to do something different? I did not use the search function though.
Reply

#4
The git tool ./all uses native for release builds. I don't know how if you can easily compile the game from the sources in the package the same way actual releases are built. But don't take my word for it.
Reply

#5
Hmm... I actually thought I made a release build before, but it seems that I have not.
I made a optimized debug though Smile
I'm back with running builds from unedited makefiles now.

I can rather easily go into Darkplaces and do a "make clean, make release" with possible options in makefile.inc release section. After all here is were most magic is done. And in netradiants makefile.conf there is a "debug" that can be replaces with "release" in the top. Maybe this is all that is needed?

When using ./all I figured it out so far that it seems to be ./all release *something* that is supposed to be used, but I can not figure out what "something" should be.

Anyway, I'm mostly after to see what I can gain by using the different optimization variables available. But I doubt much can happen due to the bottleneck is the GPU anyway. But it is still interesting to find out how. (Have a fanless AMD 6670 that can be used together with the built-in GPU when supported... too bad Linux does not seem to support dual graphics...)
What I really need is something purely cpu dependent to play with. But I have not found anything "heavier" than qbittorrent. All other stuff I tried have dependencies from hell, and when all that is built I usually end up with build errors anyway.
Reply

#6
(06-24-2012, 06:02 AM)swejuggalo Wrote: Anyway, I'm mostly after to see what I can gain by using the different optimization variables available.

Sometime around 2006 I made a shellscript to compile loads of versions of Darkplaces with different CFLAGS and then run Nexuiz benchmarks. I don't have the script anymore but I know it was on the old Nexuiz forums with esteel saying that it was 'too Gentoo'. I still have the benchmark results for different options on an Athlon XP:

Min Average Max
i386-O2 2.56 49.81 159.69
i386-O3 2.56 49.79 142.31
i486-O2 2.57 50.29 164.12
i486-O3 2.58 50.03 160.8
i586-O2 2.57 49.9 150.04
i586-O3 2.57 49.65 145.35
pentium-O2 2.57 49.85 149.21
pentium-O3 2.56 49.7 147.91
pentium-mmx-O2 2.57 50.17 150.63
pentium-mmx-O3 2.57 49.83 143.2
Stock 2.58 50.37 140
i686-O2 2.56 50.94 163.03
i686-O2-mmx-sse 2.58 50.94 153.73
i686-O3 2.57 51.14 159.23
i686-O3-mmx-sse 2.57 50.85 160.21
pentiumpro-O2 2.57 50.66 146.46
pentiumpro-O3 2.57 50.75 151.1
pentium2-O2 2.57 50.83 149.32
pentium2-O3 2.58 50.73 155.18
pentium3-O2 2.57 50.56 153.68
pentium3-O3 2.58 50.87 165.51
c3-O2 2.57 49.63 153
c3-O3 2.58 49.61 154.56
c3-2-O2 2.57 50.56 151.17
c3-2-O3 2.57 51.2 157.75
k6-O2 2.55 49.55 175.5
k6-O3 2.57 49.53 148.13
k6-2-O2 2.57 49.49 237.42
k6-2-O3 2.57 49.35 158.13
k6-3-O2 2.57 49.21 153.23
k6-3-O3 2.57 49.24 151.77
athlon-O2 2.57 50.83 166.25
athlon-O3 2.57 50.18 146.91
athlon-tbird-O2 2.57 50.81 145.75
athlon-tbird-O3 2.56 50.83 155.57
athlon-xp-O2 2.57 50.71 153.92
athlon-xp-O3 2.56 50.56 152.3
athlon-4-O2 2.57 51.12 157.38
athlon-4-O3 2.57 50.76 140.17
athlon-mp-O2 2.57 50.59 156.52
athlon-mp-O3 2.58 50.93 166.17
I'm at least a reasonably tolerable person to be around - Narcopic
Reply

#7
It certainly was not much difference, especially if the CPU was the bottleneck on that benchmark. Raising the max does not give us so much joy if the average has close to no improvements.
So I guess I can expect similar or even less difference in the Xonotic case.
I guess that all the CPU extensions available have different areas were they have their best use as well. Intel is marketing SSE4 as an "HD boost" for example...

Thanks for the test results.
Reply

#8
(06-24-2012, 09:28 AM)edh Wrote:
(06-24-2012, 06:02 AM)swejuggalo Wrote: Anyway, I'm mostly after to see what I can gain by using the different optimization variables available.

Sometime around 2006 I made a shellscript to compile loads of versions of Darkplaces with different CFLAGS and then run Nexuiz benchmarks. I don't have the script anymore but I know it was on the old Nexuiz forums with esteel saying that it was 'too Gentoo'. I still have the benchmark results for different options on an Athlon XP:

<bla>

http://archive.alienTRAP.org/forum/viewt...f=12&t=850

I didn't know you were Ed on the AT forum! Welp.
Reply

#9
Yes, that was me. This forum doesn't allow 2 letter names...

Thanks for the link to my old post. Knew it was archived somewhere but couldn't find it. If someone wants to port this script to work with Xonotic the-big-benchmark.sh then we could have some week long benchmarking fun!
I'm at least a reasonably tolerable person to be around - Narcopic
Reply

#10
If you ever want a name change, I can provide.
Reply

#11
I run Funtoo Linux, a fork of Gentoo, and I've also been experimenting with optimizing Xonotic. My desktop system is old. I have an AMD Athlon XP 1.8 GHz processor with an ATI Radeon 9200 graphics card. I get around 5 FPS even with the graphics settings as low as they will go. Fortunately my laptop is newer and faster, but it isn't really built for gaming. So I've been hoping I could tweak it enough to get it running reasonably well on this desktop system.

In the end, you really can't beat decent hardware, but you might be able to squeeze out a tiny bit of extra performance in some areas. I use the open source Radeon driver, and it seems that compiling it with LTO (link time optimization) does seem to help. Xonotic still reported about 5 FPS, but subjectively the game's graphics seemed somewhat smoother.

The other thing that seemed to help was compiling Xonotic itself using PGO (profile guided optimization). This involves compiling Xonotic once and including something like "-fprofile-generate=/var/tmp/pgo" in your CFLAGS. Then you run the game for a while, and it will dump data about the game's code execution to /var/tmp/pgo. Running the game a longer period of time may or may not yield better data; I really don't know. Finally, recompile Xonotic with -fprofile-use=/var/tmp/pgo in your CFLAGS. This will tell GCC to use the previously dumped data to help it better optimize the binary. Even on my slow system, this did seem to help. I still mostly got 5 FPS, but it sometimes got up to 7 or 8 FPS after using PGO.

LTO could potentially provide a significant speedup to Xonotic itself, but when I tried this, Xonotic wouldn't even run. I simply got a message saying "illegal instruction". Since LTO in GCC is still under development, it's not entirely reliable, so how well it works is pretty much on an individual program basis. I've had no problem using it with my ATI drivers, for example, but I can't use it on the open source Intel drivers (on my laptop) because it requires a huge amount of RAM; the 3 GB RAM on my laptop isn't sufficient. Hopefully in the future LTO will become more stable. Once it improves, LTO will probably make a noticeable difference in Xonotic's speed.

As far as general GCC flags, this is what I use for general purposes on my system.

"-O2 -march=athlon-xp -pipe -fgraphite-identity -floop-block -floop-interchange -floop-strip-mine -fbranch-target-load-optimize2 -fgcse-after-reload -fgcse-las -fgcse-sm -finline-functions -fpredictive-commoning -freorder-blocks-and-partition -ftree-vectorize"

After looking over the GCC optimization docs, I felt like this was a good set of flags because it enables several types of optimization (some of which are enabled at -O3 but not -O2) but avoids the more controversial optimizations which often result in a larger binary without necessarily improving execution speed. Be aware that, if you try using PGO, you may have to refrain from using -freorder-blocks-and-partition. This particular flag caused an internal compiler error on my system (it's a known issue http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53743 ).

A final note. I tried compiling Xonotic with the optimization flags used by default in makefile.inc which are "-fno-math-errno -ffinite-math-only -fno-rounding-math -fno-signaling-nans -fno-trapping-math". I actually got worse performance with these flags. The graphics were slower and more choppy. Besides, while the Xonotic programmers may have enough knowledge about the internal workings of Xonotic to be confident that they will not have any negative consequences, the fact remains that some of those options can result in incorrect calculations which have the potential to cause problems during runtime.

All of the optimization flags discussed are documented at http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html I hope these suggestions prove useful.
Reply

#12
Thanks for your writeup. I'll try to submit this to divVerent!
Reply

#13
What about the default flags plus -march=athlon-xp?

As for the default flags: the extra -f flags are safe for DarkPlaces. Originally it was written for -ffast-math, even, but we then noticed we do not want to have SOME parts of that, e.g. associative-math.
BRLOGENSHFEGLE (core dumped)

The Bot Orchestra is back! | Xoylent Easter Egg | 5bots1piano
My music on Google Play and SoundCloud
Reply

#14
I had tacked those default flags onto "-O2 -march=athlon-xp -pipe".
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB original theme © iAndrew 2016, remixed by -z-