Parallel tasks get better performances with boost::thread than with ppl or OpenMP

c++ openmp boost-thread ppl

10,496

Solution 1

OpenMP or PPL do no such thing as being pessimistic. They just do as they are told, however there's some things you should take into consideration when you do try to paralellize loops.

Without seeing how you implemented these things, it's hard to say what the real cause may be.

Also if the operations in each iteration have some dependency on any other iterations in the same loop, then this will create contention, which will slow things down. You haven't shown what your some_operation function actually does, so it's hard to tell if there is data dependencies.

A loop that can be truly parallelized has to be able to have each iteration run totally independent of all other iterations, with no shared memory being accessed in any of the iterations. So preferably, you'd write stuff to local variables and then copy at the end.

Not all loops can be parallelized, it is very dependent on the type of work being done.

For example, something that is good for parallelizing is work being done on each pixel of a screen buffer. Each pixel is totally independent from all other pixels, and therefore, a thread can take one iteration of a loop and do the work without needing to be held up waiting for shared memory or data dependencies within the loop between iterations.

Also, if you have a contiguous array, this array may be partly in a cache line, and if you are editing element 5 in thread A and then changing element 6 in thread B, you may get cache contention, which will also slow down things, as these would be residing in the same cache line. A phenomenon known as false sharing.

There is many aspects to think about when doing loop parallelization.

Solution 2

In short words, openMP is mainly based on shared memory, with additional cost of tasking management and memory management. ppl is designed to handle generic patterns of common data structures and algorithms, it brings additional complexity cost. Both of them have additional CPU cost, but your simple falling down boost threads do not (boost threads are just simple API wrapping). That's why both of them are slower than your boost version. And, since the exampled computation is independent for each other, without synchronization, openMP should be close to the boost version.

It occurs in simple scenarios, but, for complicated scenarios, with complicated data layout and algorithms, it should be context dependent.

10,496

Author by

888

Updated on June 09, 2022

Comments

888 about 2 years
I have a C++ program which could be parallelized. I'm using Visual Studio 2010, 32bit compilation.

In short the structure of the program is the following
```
#define num_iterations 64 //some number

struct result
{ 
    //some stuff
}

result best_result=initial_bad_result;

for(i=0; i<many_times; i++)
{ 
    result *results[num_iterations];


    for(j=0; j<num_iterations; j++)
    {
        some_computations(results+j);
    }

    // update best_result; 
}
```
Since each some_computations() is independent(some global variables read, but no global variables modified) I parallelized the inner for-loop.

My first attempt was with boost::thread,
```
 thread_group group;
 for(j=0; j<num_iterations; j++)
 {
     group.create_thread(boost::bind(&some_computation, this, result+j));
 } 
 group.join_all();
```
The results were good, but I decided to try more.

I tried the OpenMP library
```
 #pragma omp parallel for
 for(j=0; j<num_iterations; j++)
 {
     some_computations(results+j);
 } 
```
The results were worse than the boost::thread's ones.

Then I tried the ppl library and used parallel_for():
```
 Concurrency::parallel_for(0,num_iterations, [=](int j) { 
     some_computations(results+j);
 })
```
The results were the worst.

I found this behaviour quite surprising. Since OpenMP and ppl are designed for the parallelization, I would have expected better results, than boost::thread. Am I wrong?

Why is boost::thread giving me better results?
Tony The Lion over 11 years

you function some_operation takes an offset into an array, and the array is shared among several threads. I don't know that either PPL or OpenMP can make any garantuees you're not writing to that array, or that anything else is writing to that array. Therefore my answer doesn't change.
Hristo Iliev over 11 years

Your first paragraph is not true. Neither OpenMP nor PPL cares what you do to shared variables and there is nothing pessimistic or optimistic in the way they work. Both are imperative programming concepts, which means that the compiler makes the code parallel if told so rather than treating the expressions just as hints. Proper treatment of shared variables is left solely to the programmer.
Moss over 11 years

OpenMP is not designed for message passing, MPI is the one that passes masseges.
Peixu Zhu over 11 years

@Moss, thanks, I mixed up OpenMP and MPI. OpenMP is share-memory based.