C++ OpenMP Parallel For Loop - Alternatives to std::vector
Solution 1
The question you link was talking about the fact that "that STL vector container is not thread-safe in the situation where multiple threads write to a single container" . This is only true, as stated correctly there, if you call methods that can cause reallocation of the underlying array that std::vector
holds. push_back()
, pop_back()
and insert()
are examples of these dangerous methods.
If you need thread safe reallocation, then the library intel thread building block offers you concurrent vector containers . You should not use tbb::concurrent_vector in single thread programs because the time it takes to access random elements is higher than the time std::vector takes to do the same (which is O(1)). However, concurrent vector calls push_back()
, pop_back()
, insert()
in a thread safe way, even when reallocation happens.
EDIT 1: The slides 46 and 47 of the following Intel presentation give an illustrative example of concurrent reallocation using tbb::concurrent_vector
EDIT 2: By the way, if you start using Intel Tread Building Block (it is open source, it works with most compilers and it is much better integrated with C++/C++11 features than openmp), then you don't need to use openmp to create a parallel_for, Here is a nice example of parallel_for using tbb.
Solution 2
I think you can use std::vector
with OpenMP most of the time and still have good performance. The following code for example fills std::vectors
in parallel and then combines them in the end. As long as your main loop/fill function is the bottleneck this should work well in general and be thread safe.
std::vector<int> vec;
#pragma omp parallel
{
std::vector<int> vec_private;
#pragma omp for nowait //fill vec_private in parallel
for(int i=0; i<100; i++) {
vec_private.push_back(i);
}
#pragma omp critical
vec.insert(vec.end(), vec_private.begin(), vec_private.end());
}
Edit:
OpenMP 4.0 allows user-defined reductions using #pragma omp declare reduction
. The code above can be simplified with to
#pragma omp declare reduction (merge : std::vector<int> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
std::vector<int> vec;
#pragma omp parallel for reduction(merge: vec)
for(int i=0; i<100; i++) vec.push_back(i);
Edit: What I have shown so far does not fill the vector in order. If the order matters then this can be done like this
std::vector<int> vec;
#pragma omp parallel
{
std::vector<int> vec_private;
#pragma omp for nowait schedule(static)
for(int i=0; i<N; i++) {
vec_private.push_back(i);
}
#pragma omp for schedule(static) ordered
for(int i=0; i<omp_get_num_threads(); i++) {
#pragma omp ordered
vec.insert(vec.end(), vec_private.begin(), vec_private.end());
}
}
This avoids saving a std::vector for each thread and then merging them in serial outside of the parallel region. I learned about this "trick" here. I'm not sure how to do this (or if it's even possible) for user-defined reductions.. It's not possible to do this with user-defined reductions.
I just realized that the critical section is not necessary which I figured out from this question parallel-cumulative-prefix-sums-in-openmp-communicating-values-between-thread. This method also gets the order correct as well
std::vector<int> vec;
size_t *prefix;
#pragma omp parallel
{
int ithread = omp_get_thread_num();
int nthreads = omp_get_num_threads();
#pragma omp single
{
prefix = new size_t[nthreads+1];
prefix[0] = 0;
}
std::vector<int> vec_private;
#pragma omp for schedule(static) nowait
for(int i=0; i<100; i++) {
vec_private.push_back(i);
}
prefix[ithread+1] = vec_private.size();
#pragma omp barrier
#pragma omp single
{
for(int i=1; i<(nthreads+1); i++) prefix[i] += prefix[i-1];
vec.resize(vec.size() + prefix[nthreads]);
}
std::copy(vec_private.begin(), vec_private.end(), vec.begin() + prefix[ithread]);
}
delete[] prefix;
Related videos on Youtube
Armin Meisterhirn
Updated on April 13, 2020Comments
-
Armin Meisterhirn about 4 years
Based on this thread, OpenMP and STL vector, which data structures are good alternatives for a shared std::vector in a parallel for loop? The main aspect is speed, and the vector might require resizing during the loop.
-
LihO almost 11 yearsShow us some code, describe your specific situation... what will be stored in the vector? What will your loop do with it? It's very probable that it will be perfectly safe to use
std::vector
anyway.
-
-
Hristo Iliev over 9 yearsAs for the question in the very last sentence: "The number of times the combiner is executed, and the order of these executions, for any
reduction
clause is unspecified," therefore not possible. -
Admin almost 6 yearsThanks, this really helped me!
-
Joachim over 4 yearsJust a question: if we had a nested for loop, would the code above still hold by writting:
std::vector<int> vec; #pragma omp parallel { #pragma omp for collapse(2) nowait schedule(static) for(int i=0; i<N; i++) { for(int j=0; j< M; j++){ } } #pragma omp for collapse(2) schedule(static) ordered for(int i=0; i<omp_get_num_threads(); i++) { #pragma omp ordered do some stuff } }
-
Joachim over 4 yearsSorry for the ugly indentation
-
Z boson over 4 years@Joachim, I don't know. I don't normally use nested parallelism. I don't have time to look into this either. Maybe ask a question. There are many experts here that can help you.
-
Joachim over 4 yearsThanks, I think I will