Speed performance of a Qt program: Windows vs Linux

c++ windows linux performance qt

10,083

Another option it could be: on linux qt are just loaded, this could happens i.e. if you use KDE, while in Windows library must be loaded so this slow down computation time. To check how much library loading waste your application you could write a dummy test with pure c++ code.

10,083

Author by

Seub

Updated on June 09, 2022

Comments

Seub almost 2 years
I've already posted this question here, but since it's maybe not that Qt-specific, I thought I might try my chance here as well. I hope it's not inappropriate to do that (just tell me if it is).

I’ve developed a small scientific program that performs some mathematical computations. I’ve tried to optimize it so that it’s as fast as possible. Now I’m almost done deploying it for Windows, Mac and Linux users. But I have not been able to test it on many different computers yet.

Here’s what troubles me: To deploy for Windows, I’ve used a laptop which has both Windows 7 and Ubuntu 12.04 installed on it (dual boot). I compared the speed of the app running on these two systems, and I was shocked to observe that it’s at least twice as slow on Windows! I wouldn’t have been surprised if there were a small difference, but how can one account for such a difference?

Here are a few precisions:
- The computation that I make the program do are just some brutal and stupid mathematical calculations, basically, it computes products and cosines in a loop that is called a billion times. On the other hand, the computation is multi-threaded: I launch something like 6 QThreads.
- The laptop has two cores @1.73Ghz. At first I thought that Windows was probably not using one of the cores, but then I looked at the processor activity, according to the small graphic, both cores are running 100%.
- Then I thought the C++ compiler for Windows didn’t the use the optimization options (things like -O1 -O2) that the C++ compiler for Linux automatically did (in release build), but apparently it does.
I’m bothered that the app is so mush slower (2 to 4 times) on Windows, and it’s really weird. On the other hand I haven’t tried on other computers with Windows yet. Still, do you have any idea why the difference?

Additional info: some data…

Even though Windows seems to be using the two cores, I’m thinking this might have something to do with threads management, here’s why:

Sample Computation n°1 (this one launches 2 QThreads):
- PC1-windows: 7.33s
- PC1-linux: 3.72s
- PC2-linux: 1.36s
Sample Computation n°2 (this one launches 3 QThreads):
- PC1-windows: 6.84s
- PC1-linux: 3.24s
- PC2-linux: 1.06s
Sample Computation n°3 (this one launches 6 QThreads):
- PC1-windows: 8.35s
- PC1-linux: 2.62s
- PC2-linux: 0.47s
where:
- PC1-windows = my 2 cores laptop (@1.73Ghz) with Windows 7
- PC1-linux = my 2 cores laptop (@1.73Ghz) with Ubuntu 12.04
- PC2-linux = my 8 cores laptop (@2.20Ghz) with Ubuntu 12.04
(Of course, it's not shocking that PC2 is faster. What's incredible to me is the difference between PC1-windows and PC1-linux).

Note: I've also tried running the program on a recent PC (4 or 8 cores @~3Ghz, don't remember exactly) under Mac OS, speed was comparable to PC2-linux (or slightly faster).

EDIT: I'll answer here a few questions I was asked in the comments.
- I just installed Qt SDK on Windows, so I guess I have the latest version of everything (including MinGW?). The compiler is MinGW. Qt version is 4.8.1.
- I use no optimization flags because I noticed that they are automatically used when I build in release mode (with Qt Creator). It seems to me that if I write something like QMAKE_CXXFLAGS += -O1, this only has an effect in debug build.
- Lifetime of threads etc: this is pretty simple. When the user clicks the "Compute" button, 2 to 6 threads are launched simultaneously (depending on what he is computing), they are terminated when the computation ends. Nothing too fancy. Every thread just does brutal computations (except one, actually, which makes a (not so) small"computation every 30ms, basically checking whether the error is small enough).
EDIT: latest developments and partial answers

Here are some new developments that provide answers about all this:
- I wanted to determine whether the difference in speed really had something to do with threads or not. So I modified the program so that the computation only uses 1 thread, that way we are pretty much comparing the performance on "pure C++ code". It turned out that now Windows was only slightly slower than Linux (something like 15%). So I guess that a small (but not unsignificant) part of the difference is intrinsic to the system, but the largest part is due to threads management.
- As someone (Luca Carlon, thanks for that) suggested in the comments, I tried building the application with the compiler for Microsoft Visual Studio (MSVC), instead of MinGW. And suprise, the computation (with all the threads and everything) was now "only" 20% to 50% slower than Linux! I think I'm going to go ahead and be content with that. I noticed that weirdly though, the "pure C++" computation (with only one thread) was now even slower (than with MinGW), which must account for the overall difference. So as far as I can tell, MinGW is slightly better than MSVC except that it handles threads like a moron.
So, I’m thinking either there’s something I can do to make MinGW (ideally I’d rather use it than MSVC) handle threads better, or it just can’t. I would be amazed, how could it not be well known and documented ? Although I guess I should be careful about drawing conclusions too quickly, I’ve only compared things on one computer (for the moment).
Seub over 11 years

Thanks for your suggestion, see the latest Edit to my initial post for latest developements and partial answers.
Seub over 11 years

Thank you for your interest and sharing your knowledge. I believe that in my situation though, the explanation is to be found mostly in threads management: please see my latest Edit (in question) for latest developements and partial answers.
Steve-o over 11 years

Adaptive mutexes, i.e. mutexes wrapped by a spin lock would be even better for many cases. Available on both Windows and Linux.