OpenMP threads executing on the same cpu core
Solution 1
After some experimentation I found out that the problem was that I was starting my program from inside the eclipse IDE, which seemingly set the affinity to use only one core. I thought I got the same problems when starting from outside of the IDE, but a repeated test showed that the program works just fine, when started from the terminal instead of from inside the ide.
Solution 2
I compiled your program using g++ 4.6 on Linux
g++ --std=c++0x -fopenmp test.cc -o test
The output was, unsurprisingly:
Thread 2 on cpu 2
Thread 3 on cpu 1
910270973
Thread 1 on cpu 3
910270973
Thread 0 on cpu 0
910270973910270973
The fact that 4 threads are started (if you have not set the number of threads in any way, e.g. using OMP_NUM_THREADS) should imply that the program is able to see 4 usable CPUs. I cannot guess why it is not using them but I suspect a problem in your hardware/software setting, in some environment variable, or in the compiler options.
Grizzly
Updated on June 22, 2022Comments
-
Grizzly almost 2 years
I'm currently parallelizing program using openmp on a 4-core phenom2. However I noticed that my parallelization does not do anything for the performance. Naturally I assumed I missed something (falsesharing, serialization through locks, ...), however I was unable to find anything like that. Furthermore from the CPU Utilization it seemed like the program was executed on only one core. From what I found
sched_getcpu()
should give me the Id of the core the thread executing the call is currently scheduled on. So I wrote the following test program:#include <iostream> #include <sstream> #include <omp.h> #include <utmpx.h> #include <random> int main(){ #pragma omp parallel { std::default_random_engine rand; int num = 0; #pragma omp for for(size_t i = 0; i < 1000000000; ++i) num += rand(); auto cpu = sched_getcpu(); std::ostringstream os; os<<"\nThread "<<omp_get_thread_num()<<" on cpu "<<sched_getcpu()<<std::endl; std::cout<<os.str()<<std::flush; std::cout<<num; } }
On my machine this gives the following output(the random numbers will vary of course):
Thread 2 on cpu 0 num 127392776 Thread 0 on cpu 0 num 1980891664 Thread 3 on cpu 0 num 431821313 Thread 1 on cpu 0 num -1976497224
From this I assume that all threads execute on the same core (the one with id 0). To be more certain I also tried the approach from this answer. The results where the same. Additionally using
#pragma omp parallel num_threads(1)
didn't make the execution slower (slightly faster in fact), lending credibility to the theory that all threads use the same cpu, however the fact that the cpu is always displayed as0
makes me kind of suspicious. Additionally I checkedGOMP_CPU_AFFINITY
which was initially not set, so I tried setting it to0 1 2 3
, which should bind each thread to a different core from what I understand. However that didn't make a difference.Since develop on a windows system, I use linux in virtualbox for my development. So I though that maybe the virtual system couldn't access all cores. However checking the settings of virtualbox showed that the virtual machine should get all 4 cores and executing my test program 4 times at the same time seems to use all 4 cores judging from the cpu utilization (and the fact that the system was getting very unresponsive).
So for my question is basically what exactly is going on here. More to the point: Is my deduction that all threads use the same core correctly? If it is, what could be the reasons for that behavious?
-
Grizzly about 12 yearsWhy would I use
#pragma omp parallel for
, if I want the threads to do things outside the loop (like writing their id to the output)? And as I mentioned it does create 4 threads by default, the just seem to be executed on the same core -
Grizzly about 12 yearsThe vbox does have the correct affinity to use all cores (I checked and besides how would it use all of them in my test with multiple starts of my testprogram). Since I use linux inside the vbox that doesn't really help there.
-
Nav about 12 yearsThat's true too. btw, if you don't say omp parallel for, then no parallelization happens in the loop. But of course you're inside a parallel section, so.... The only other possible explanation I can think of is a lack of hardware support for your virtualbox. Have you tried with other CPU's? superuser.com/questions/33723/…
-
Grizzly about 12 yearsI did not. However as mentioned it is possible to use all cores from the vbox, so lack of support does seem unlikely
-
Y00 over 3 yearsThese can be set via variables like these: web.archive.org/web/20220114064748/https://…