Writing a (spinning) thread barrier using c++11 atomics
Solution 1
It looks needlessly complicated. Try this simpler version (well, I haven't tested it, I just meditated on it:))) :
#include <atomic>
class spinning_barrier
{
public:
spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}
bool wait ()
{
unsigned int step = step_.load ();
if (nwait_.fetch_add (1) == n_ - 1)
{
/* OK, last thread to come. */
nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
step_.fetch_add (1);
return true;
}
else
{
/* Run in circles and scream like a little girl. */
while (step_.load () == step)
;
return false;
}
}
protected:
/* Number of synchronized threads. */
const unsigned int n_;
/* Number of threads currently spinning. */
std::atomic<unsigned int> nwait_;
/* Number of barrier syncronizations completed so far,
* it's OK to wrap. */
std::atomic<unsigned int> step_;
};
EDIT: @Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. I've run it on a dual-socket/quad-core GNU/Linux machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough.
Solution 2
I have no idea if this is going to be of help, but the following snippet from Herb Sutter's implementation of a concurrent queue uses a spinlock based on atomics:
std::atomic<bool> consumerLock;
{ // the critical section
while (consumerLock.exchange(true)) { } // this is the spinlock
// do something useful
consumerLock = false; // unlock
}
In fact, the Standard provides a purpose-built type for this construction that is required to have lock-free operations, std::atomic_flag
. With that, the critical section would look like this:
std::atomic_flag consumerLock;
{
// critical section
while (consumerLock.test_and_set()) { /* spin */ }
// do stuff
consumerLock.clear();
}
(You can use acquire and release memory ordering there if you prefer.)
Solution 3
Here is an elegant solution from the book C++ Concurrency in Action: Practical Multithreading.
struct bar_t {
unsigned const count;
std::atomic<unsigned> spaces;
std::atomic<unsigned> generation;
bar_t(unsigned count_) :
count(count_), spaces(count_), generation(0)
{}
void wait() {
unsigned const my_generation = generation;
if (!--spaces) {
spaces = count;
++generation;
} else {
while(generation == my_generation);
}
}
};
Solution 4
Here is a simple version of mine :
// spinning_mutex.hpp
#include <atomic>
class spinning_mutex
{
private:
std::atomic<bool> lockVal;
public:
spinning_mutex() : lockVal(false) { };
void lock()
{
while(lockVal.exchange(true) );
}
void unlock()
{
lockVal.store(false);
}
bool is_locked()
{
return lockVal.load();
}
};
Usage : (from std::lock_guard example)
#include <thread>
#include <mutex>
#include "spinning_mutex.hpp"
int g_i = 0;
spinning_mutex g_i_mutex; // protects g_i
void safe_increment()
{
std::lock_guard<spinning_mutex> lock(g_i_mutex);
++g_i;
// g_i_mutex is automatically released when lock
// goes out of scope
}
int main()
{
std::thread t1(safe_increment);
std::thread t2(safe_increment);
t1.join();
t2.join();
}
Solution 5
I know the thread is a little bit old, but since it is still the first google result when searching for a thread barrier using c++11 only, I want to present a solution that gets rid of the busy waiting using the std::condition_variable
.
Basically it is the solution of chill, but instead of the while
loop it is using std::conditional_variable.wait()
and std::conditional_variable.notify_all()
. In my tests it seems to work fine.
#include <atomic>
#include <condition_variable>
#include <mutex>
class SpinningBarrier
{
public:
SpinningBarrier (unsigned int threadCount) :
threadCnt(threadCount),
step(0),
waitCnt(0)
{}
bool wait()
{
if(waitCnt.fetch_add(1) >= threadCnt - 1)
{
std::lock_guard<std::mutex> lock(mutex);
step += 1;
condVar.notify_all();
waitCnt.store(0);
return true;
}
else
{
std::unique_lock<std::mutex> lock(mutex);
unsigned char s = step;
condVar.wait(lock, [&]{return step == s;});
return false;
}
}
private:
const unsigned int threadCnt;
unsigned char step;
std::atomic<unsigned int> waitCnt;
std::condition_variable condVar;
std::mutex mutex;
};
Grizzly
Updated on July 27, 2022Comments
-
Grizzly almost 2 years
I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). my class looks basically as followed:
class barrier { private: std::atomic<int> counter[2]; std::atomic<int> lock[2]; std::atomic<int> cur_idx; int thread_count; public: //constructors... bool wait(); };
All members are initialized to zero, except thread_count, which holds the appropriate count. I have implemented the wait function as
int idx = cur_idx.load(); if(lock[idx].load() == 0) { lock[idx].store(1); } int val = counter[idx].fetch_add(1); if(val >= thread_count - 1) { counter[idx].store(0); cur_idx.fetch_xor(1); lock[idx].store(0); return true; } while(lock[idx].load() == 1); return false;
However when trying to use it with two threads (
thread_count
is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get toint val = counter[idx].fetch_add(1);
, but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by usingvolatile int
instead ofstd::atomic<int>
and writingwait
as followed:int idx = cur_idx; if(lock[idx] == 0) { __sync_val_compare_and_swap(&lock[idx], 0, 1); } int val = __sync_fetch_and_add(&counter[idx], 1); if(val >= thread_count - 1) { __sync_synchronize(); counter[idx] = 0; cur_idx ^= 1; __sync_synchronize(); lock[idx] = 0; __sync_synchronize(); return true; } while(lock[idx] == 1); return false;
it works just fine. From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). So which of the following scenarios applies?
- I got lucky with the second implementation and my algorithm is crap
- I didn't fully understand
std::atomic
and there is a problem with the first variant (but not the second) - It should work, but the experimental implementation for c++11 libraries isn't as mature as I have hoped
For the record I'm using 32bit mingw with gcc 4.6.1
The calling code looks like this:
spin_barrier b(2); std::thread t([&b]()->void { std::this_thread::sleep_for(std::chrono::duration<double>(0.1)); b.wait(); }); b.wait(); t.join();
Since mingw doesn't whave
<thread>
headers jet I use a self written version for that which basically wraps the appropriate pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping) Any insights would be appreciated.edit: Explanation for the algorithm to make it clearer:
thread_count
is the number of threads which shall wait for the barrier (so ifthread_count
threads are in the barrier all can leave the barrier).lock
is set to one when the first (or any) thread enters the barrier.counter
counts how many threads are inside the barrier and is atomically incremented once for each threadif counter>=thread_count
all threads are inside the barrier so counter and lock are reset to zero- otherwise the thread waits for the
lock
to become zero - in the next use of the barrier different variables (
counter
,lock
) are used ensure there are no problems if threads are still waiting on the first use of the barrier (e.g. they had been preempted when the barrier is lifted)
edit2: I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's
std::atomic
, but I'm still not completely convinced, since looking into the<atomic>
header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions -
sehe over 12 yearsoh - so much more peaceful on the eyes. Fewer than 6
->
and__
per square inch is a blessing. And, you can have whitespace too! I'll think I'll go and clean up that OP -
Kerrek SB over 12 yearsThanks :-) The actual queue is actually pretty neat, too. It's on Dr Dobbs, Measuring Parallel Performance: Optimizing a Concurrent Queue.
-
sehe over 12 yearsI've seen it! One of those warning posts (stay away from hand-rolled lock-free unless...). I prefer using TBB or LibCds (PS. formatted OP)
-
Kerrek SB over 12 years@sehe: But what choice is there other than hand-rolled? I saw LibCds, is it any good? Doxygen makes my skin crawl... There's some guy who made "Boost lockfree", which is not part of Boost, but again no idea how good it is. Is TBB general C++, or only for Intel compiler?
-
sehe over 12 yearsLibCds is pretty darn good. It has maps, stacks, queues, order lists with a number of (generic pluggable) garbage collection strategies. The threading library can be plugged (i use pthreads) and the most important reason why it isn't in boost, is potential licensing issues with the algorithms, IIRC. Edit Following the link to Boost lockfree, I remember seeing some conversation between the developer of LibCds and Tim Blechmann about implementation specifics and possible inclusion in boost.
-
sehe over 12 years
-
Kerrek SB over 12 yearsI didn't realize that maps even exist. I think there was some theoretical paper or so, but there was some fundamental problem with it (a tree, that is). I think that someone proved that its impossible without garbage collection.
-
sehe over 12 years
-
Grizzly over 12 yearsunfortunately since I'm trying to understand why my specific algorithm doesn't work that doesn't help at all
-
Kerrek SB over 12 years@Grizzly: Yeah, sorry. I can't say I see through your algorithm just yet; I just vaguely thought this might be interesting. Sorry!
-
sehe over 12 yearsnice work - though I never think of screaming girls while meditating
-
chill over 12 years@sehe, oh noes, he did it! He messed with my holy GNU formatting! :P
-
sehe over 12 yearsHuh. I read lots of GNU code, I'm pretty sure I haven't seen those half-indents. Now, it could be an SO artifact, or just my sense of direction when making my way through GNU code :) Ok, I'll be more conservative next time
-
Grizzly over 12 yearsI woukdn't say my version is that muc more complecated, but that version does look nicer, so thanks for that. Unfortunately it seems to have the same problems as my original version (just did a quick test, need to verify some more to be sure)
-
Grizzly over 12 years@sehe: libCds looks pretty intresting, however from looking over code/documentation it seems to be pretty much specialized for linux/gcc and windows/msvc. Any information about using/compiling with mingw on windows or gcc on mac? (aka have I missed something which enables that or is it possible to do without too much work)
-
sehe over 12 years@Grizzly I haven't done it, but I don't see what the issue would be, really. You could try (or I could, but that has O(infinity) time complexity in the worst case
-
Grizzly over 12 years@sehe: from what I have seen it won't work without changes since at the very least os detection in compiler/gcc/defs.h will not recognize either os and somewhere it seemed to make decisions based on os first, compiler second (so no gcc path on win). also the makefile seems to check the os depending on the compiler (so no recognization of windows for gcc). While these issues are not to bad to fix, those are only isses a quick grep trevealed. I might still try it, but I doubt it would be straightforward, so I simply hoped there might be existing experience
-
Nathan Binkert over 11 yearsspinning_locker can likely be replaced by std::lock_guard
-
Mikhail almost 11 yearsThis is fine but I think the point of the thread was the spinlock.
-
IneQuation over 10 years@ali_bahoo your implementation is almost correct. Almost, because here:
bool expected = false; while(!lockVal.compare_exchange_strong( expected,true ) );
after the first call tocompare_exchange_strong
, the value ofexpected
becomestrue
, and on the subsequent spins, the lock will let any thread through. Check out the std::atomic<> reference. Correct way would be:bool expected = false; while(!lockVal.compare_exchange_strong( expected,true ) ) expected = false;
-
ali_bahoo over 10 years@IneQuation: No.
expected
is not a class member. Each thread entering theLock()
method will face a newexpected
variable. It is just a value on the stack, that will be cleaned up when exiting theLock()
method. So you do not have to set it tofalse
. -
IneQuation over 10 yearsNo, @ali_bahoo, you are wrong. :) You may also test your code, it doesn't work as intended, and this is because the value of
expected
on the stack is being overwritten bycompare_exchange_strong
. The reference I linked says: "Compares the contents of the contained value with expected: - if true, it replaces the contained value with val (like store). - if false, it replaces expected with the contained value ." This breaksexpected
for the next loop iteration. -
ali_bahoo over 10 years@IneQuation
expected
is just a variable, just a non-const reference parameter tocompare_exchange_strong
function. The value of LockVal is important.compare_exchange_strong
compares the value of 'lockVal' if it isfalse
. If not, it spins, until the lock is unlocked. -
IneQuation over 10 yearsSorry, but no, you are wrong - contents of
expected
are changed if the comparison fails, as the reference I quoted clearly states. I only wrote here because I have actually tried using your code and it failed, I started reading up and arrived to the conclusions I've already stated above. Take it or leave it, I'm not here to argue on the internet just for the argument's sake. -
ali_bahoo over 10 years@IneQuation God! What was I thinking? For this purpose, better approach is to use
std::atomic::exchange()
instead ofcompare_exchange_strong
. It is simpler. -
Maxim Egorushkin almost 10 yearsThis is a critical section though, not a barrier. And a poor one because it does not use CPU pause instruction.
-
Kerrek SB almost 10 years@MaximYegorushkin: Yeah, that's true, it's not a barrier. There's no "pause" in C++, unfortunately. Maybe
this_thread::yield
might work, but that seems wrong (we don't want to deschedule, just spin efficiently.) Bear in mind that the OP doesn't want to use a ready-made implementation. -
Maxim Egorushkin almost 10 yearsThere's no "pause" in C++, unfortunately. - I find such answers akin to limiting belief systems. If that were true, libraries such as TBB and many useful C++ applications would not exist. Just because something is outside of scope of the C++ standard does not make it impossible to use.
-
mchiasson over 9 yearsIt's very close to my simple version of mine. But instead of using
std::atomic<bool>
, I usestd::atomic_flag
withstd::atomic_flag::test_and_set(std::memory_order_acquire)
for acquiring andstd::atomic_flag::clear(std::memory_order_release)
for releasing. This technique will give you a pure lockless barrier. -
mchiasson about 9 yearsspinlocks are typically lockless (free of mutex)
-
eyeshield21 over 8 yearsYes, nice. This is hidden away round about page 267 for anyone wanting more information as barrier does not appear in the index of my Manning 2012 copy.
-
neocpp almost 8 yearsI know this is a bit dated, but is this really wait-free? I thought wait-free meant that any thread completes its operation in at most a bounded number of steps regardless of what other threads are doing, which is violated if any sort of lock is involved (spinlock or otherwise). The linked article in the comments doesn't seem to mention wait-free at all, although there is a another article that looks at a lock-free implementation (I ran out of views though, so I can't actually read that one).
-
Kerrek SB almost 8 years@neocpp: No, the queue is neither wait-free, nor even lock-free, because of the spinlock. The article was fairly cavalier about those terms; let me remove them from the answer. Note that instead of an
atomic<bool>
you could/should usestd::atomic_flag
, which is required to have lock-free operations (unlikeatomic<bool>
). -
neocpp almost 8 yearsUpvoted with the update, the wait-free implementation with a spinlock looked like an oxymoron to me so I just wanted to make sure I wasn't missing anything.