Parallel tasks get better performances with boost::thread than with ppl or OpenMP
Solution 1
OpenMP or PPL do no such thing as being pessimistic. They just do as they are told, however there's some things you should take into consideration when you do try to paralellize loops.
Without seeing how you implemented these things, it's hard to say what the real cause may be.
Also if the operations in each iteration have some dependency on any other iterations in the same loop, then this will create contention, which will slow things down. You haven't shown what your some_operation
function actually does, so it's hard to tell if there is data dependencies.
A loop that can be truly parallelized has to be able to have each iteration run totally independent of all other iterations, with no shared memory being accessed in any of the iterations. So preferably, you'd write stuff to local variables and then copy at the end.
Not all loops can be parallelized, it is very dependent on the type of work being done.
For example, something that is good for parallelizing is work being done on each pixel of a screen buffer. Each pixel is totally independent from all other pixels, and therefore, a thread can take one iteration of a loop and do the work without needing to be held up waiting for shared memory or data dependencies within the loop between iterations.
Also, if you have a contiguous array, this array may be partly in a cache line, and if you are editing element 5 in thread A and then changing element 6 in thread B, you may get cache contention, which will also slow down things, as these would be residing in the same cache line. A phenomenon known as false sharing.
There is many aspects to think about when doing loop parallelization.
Solution 2
In short words, openMP
is mainly based on shared memory, with additional cost of tasking management and memory management. ppl
is designed to handle generic patterns of common data structures and algorithms, it brings additional complexity cost. Both of them have additional CPU cost, but your simple falling down boost
threads do not (boost
threads are just simple API wrapping). That's why both of them are slower than your boost
version. And, since the exampled computation is independent for each other, without synchronization, openMP
should be close to the boost
version.
It occurs in simple scenarios, but, for complicated scenarios, with complicated data layout and algorithms, it should be context dependent.
888
Updated on June 09, 2022Comments
-
888 about 2 years
I have a C++ program which could be parallelized. I'm using Visual Studio 2010, 32bit compilation.
In short the structure of the program is the following
#define num_iterations 64 //some number struct result { //some stuff } result best_result=initial_bad_result; for(i=0; i<many_times; i++) { result *results[num_iterations]; for(j=0; j<num_iterations; j++) { some_computations(results+j); } // update best_result; }
Since each
some_computations()
is independent(some global variables read, but no global variables modified) I parallelized the innerfor
-loop.My first attempt was with boost::thread,
thread_group group; for(j=0; j<num_iterations; j++) { group.create_thread(boost::bind(&some_computation, this, result+j)); } group.join_all();
The results were good, but I decided to try more.
I tried the OpenMP library
#pragma omp parallel for for(j=0; j<num_iterations; j++) { some_computations(results+j); }
The results were worse than the
boost::thread
's ones.Then I tried the ppl library and used
parallel_for()
:Concurrency::parallel_for(0,num_iterations, [=](int j) { some_computations(results+j); })
The results were the worst.
I found this behaviour quite surprising. Since OpenMP and ppl are designed for the parallelization, I would have expected better results, than
boost::thread
. Am I wrong?Why is
boost::thread
giving me better results? -
Tony The Lion over 11 yearsyou function
some_operation
takes an offset into an array, and the array is shared among several threads. I don't know that either PPL or OpenMP can make any garantuees you're not writing to that array, or that anything else is writing to that array. Therefore my answer doesn't change. -
Hristo Iliev over 11 yearsYour first paragraph is not true. Neither OpenMP nor PPL cares what you do to shared variables and there is nothing pessimistic or optimistic in the way they work. Both are imperative programming concepts, which means that the compiler makes the code parallel if told so rather than treating the expressions just as hints. Proper treatment of shared variables is left solely to the programmer.
-
Moss over 11 yearsOpenMP is not designed for message passing, MPI is the one that passes masseges.
-
Peixu Zhu over 11 years@Moss, thanks, I mixed up OpenMP and MPI. OpenMP is share-memory based.