Understanding c++11 memory fences

29,783

Solution 1

Your usage does not actually ensure the things you mention in your comments. That is, your usage of fences does not ensure that your assignments to a are visible to other threads or that the value you read from a is 'up to date.' This is because, although you seem to have the basic idea of where fences should be used, your code does not actually meet the exact requirements for those fences to "synchronize".

Here's a different example that I think demonstrates correct usage better.

#include <iostream>
#include <atomic>
#include <thread>

std::atomic<bool> flag(false);
int a;

void func1()
{
    a = 100;
    atomic_thread_fence(std::memory_order_release);
    flag.store(true, std::memory_order_relaxed);
}

void func2()
{
    while(!flag.load(std::memory_order_relaxed))
        ;

    atomic_thread_fence(std::memory_order_acquire);
    std::cout << a << '\n'; // guaranteed to print 100
}

int main()
{
    std::thread t1 (func1);
    std::thread t2 (func2);

    t1.join(); t2.join();
}

The load and store on the atomic flag do not synchronize, because they both use the relaxed memory ordering. Without the fences this code would be a data race, because we're performing conflicting operations a non-atomic object in different threads, and without the fences and the synchronization they provide there would be no happens-before relationship between the conflicting operations on a.

However with the fences we do get synchronization because we've guaranteed that thread 2 will read the flag written by thread 1 (because we loop until we see that value), and since the atomic write happened after the release fence and the atomic read happens-before the acquire fence, the fences synchronize. (see § 29.8/2 for the specific requirements.)

This synchronization means anything that happens-before the release fence happens-before anything that happens-after the acquire fence. Therefore the non-atomic write to a happens-before the non-atomic read of a.

Things get trickier when you're writing a variable in a loop, because you might establish a happens-before relation for some particular iteration, but not other iterations, causing a data race.

std::atomic<int> f(0);
int a;

void func1()
{
    for (int i = 0; i<1000000; ++i) {
        a = i;
        atomic_thread_fence(std::memory_order_release);
        f.store(i, std::memory_order_relaxed);
    }
}

void func2()
{
    int prev_value = 0;
    while (prev_value < 1000000) {
        while (true) {
            int new_val = f.load(std::memory_order_relaxed);
            if (prev_val < new_val) {
                prev_val = new_val;
                break;
            }
        }

        atomic_thread_fence(std::memory_order_acquire);
        std::cout << a << '\n';
    }
}

This code still causes the fences to synchronize but does not eliminate data races. For example if f.load() happens to return 10 then we know that a=1,a=2, ... a=10 have all happened-before that particular cout<<a, but we don't know that cout<<a happens-before a=11. Those are conflicting operations on different threads with no happens-before relation; a data race.

Solution 2

Your usage is correct, but insufficient to guarantee anything useful.

For example, the compiler is free to internally implement a = i; like this if it wants to:

 while(a != i)
 {
    ++a;
    atomic_thread_fence(std::memory_order_release);
 }

So the other thread may see any values at all.

Of course, the compiler would never implement a simple assignment like that. However, there are cases where similarly perplexing behavior is actually an optimization, so it's a very bad idea to rely on ordinary code being implemented internally in any particular way. This is why we have things like atomic operations and fences only produce guaranteed results when used with such operations.

Share:
29,783
jcoder
Author by

jcoder

Updated on July 09, 2022

Comments

  • jcoder
    jcoder almost 2 years

    I'm trying to understand memory fences in c++11, I know there are better ways to do this, atomic variables and so on, but wondered if this usage was correct. I realize that this program doesn't do anything useful, I just wanted to make sure that the usage of the fence functions did what I thought they did.

    Basically that the release ensures that any changes made in this thread before the fence are visible to other threads after the fence, and that in the second thread that any changes to the variables are visible in the thread immediately after the fence?

    Is my understanding correct? Or have I missed the point entirely?

    #include <iostream>
    #include <atomic>
    #include <thread>
    
    int a;
    
    void func1()
    {
        for(int i = 0; i < 1000000; ++i)
        {
            a = i;
            // Ensure that changes to a to this point are visible to other threads
            atomic_thread_fence(std::memory_order_release);
        }
    }
    
    void func2()
    {
        for(int i = 0; i < 1000000; ++i)
        {
            // Ensure that this thread's view of a is up to date
            atomic_thread_fence(std::memory_order_acquire);
            std::cout << a;
        }
    }
    
    int main()
    {
        std::thread t1 (func1);
        std::thread t2 (func2);
    
        t1.join(); t2.join();
    }
    
  • jcoder
    jcoder over 11 years
    Yes thank you, I understand that the code I wrote isn't correct for other reasons, but I was struggling a bit to write a simple example to demonstrate my question.
  • David Schwartz
    David Schwartz over 11 years
    Then it sounds like you get it.
  • bames53
    bames53 over 11 years
    Are you sure about this? I don't see that the example code meets the requirements in 29.8/1 for the fences to synchronize at all, and that would mean this code has data races and therefore undefined behavior.
  • David Schwartz
    David Schwartz over 11 years
    @bames53: Correct. If I understand him correctly, he's just trying to understand the semantics of the fences. He's trying to use them with normal assignments, which of course can't work because they don't have sufficiently precise semantics. (I clarified this in the end of my answer.)
  • jcoder
    jcoder over 11 years
    Thank you for this, I think I'm struggling to completely understand this. I feel reasonably confident using the default atomic types but feel there is something about this that I'm not quite feeling I understand. I think I need a book or some better articles, probably that's a different question!
  • jcoder
    jcoder over 11 years
    @David Schwartz yes i was only interested in what the fences did, and tried to come up with an example around them. Clearly it's only added to the confusion though :)
  • bames53
    bames53 over 11 years
    The book C++ Concurrency In Action covers the C++ memory model, memory orderings, atomics, and fences very well, in addition to covering the higher level constructs.
  • bames53
    bames53 over 11 years
    @J99 or if you have specific questions about the examples I can try to answer them.
  • piotrekg2
    piotrekg2 about 8 years
    In the first example, can we guarantee that the while loop will eventually terminate? Or is there only a guarantee that if the loop terminates then the program will print 100?
  • bames53
    bames53 about 8 years
    @piotrekg2 Technically I believe that's a 'quality of implementation' issue. Implementations are supposed to ensure that writes eventually do become visible to other threads. In practice implementations do, and that loop is in practice guaranteed to terminate.
  • HCSF
    HCSF over 4 years
    @bames53 would your first example work if I drop all fences, and replace flag.store(true, std::memory_order_relaxed); with flag.store(true, std::memory_order_release); and replace flag.load(std::memory_order_relaxed) with flag.load(std::memory_order_acquire)? As the release-acquire operations will create the synchronization and so all the changes happens before release op will be visible after the acquire op. no?
  • loin.liao
    loin.liao over 4 years
    nice examples! Can I use this example code in my blog?
  • stickers
    stickers over 3 years
    @HCSF Yes, load-acquired and store-release work well in first example as well. Two good examples here: riptutorial.com/cplusplus/example/25796/fence-example and riptutorial.com/cplusplus/example/25795/need-for-memory-mode‌​l