How to guarantee 64-bit writes are atomic?

c multithreading macos atomic lock-free

27,996

Solution 1

Your best bet is to avoid trying to build your own system out of primitives, and instead use locking unless it really shows up as a hot spot when profiling. (If you think you can be clever and avoid locks, don't. You aren't. That's the general "you" which includes me and everybody else.) You should at minimum use a spin lock, see spinlock(3). And whatever you do, don't try to implement "your own" locks. You will get it wrong.

Ultimately, you need to use whatever locking or atomic operations your operating system provides. Getting these sorts of things exactly right in all cases is extremely difficult. Often it can involve knowledge of things like the errata for specific versions of specific processor. ("Oh, version 2.0 of that processor didn't do the cache-coherency snooping at the right time, it's fixed in version 2.0.1 but on 2.0 you need to insert a NOP.") Just slapping a volatile keyword on a variable in C is almost always insufficient.

On Mac OS X, that means you need to use the functions listed in atomic(3) to perform truly atomic-across-all-CPUs operations on 32-bit, 64-bit, and pointer-sized quantities. (Use the latter for any atomic operations on pointers so you're 32/64-bit compatible automatically.) That goes whether you want to do things like atomic compare-and-swap, increment/decrement, spin locking, or stack/queue management. Fortunately the spinlock(3), atomic(3), and barrier(3) functions should all work correctly on all CPUs that are supported by Mac OS X.

Solution 2

On x86_64, both the Intel compiler and gcc support some intrinsic atomic-operation functions. Here's gcc's documentation of them: http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html

The Intel compiler docs also talk about them here: http://softwarecommunity.intel.com/isn/downloads/softwareproducts/pdfs/347603.pdf (page 164 or thereabouts).

Solution 3

According to Chapter 7 of Part 3A - System Programming Guide of Intel's processor manuals, quadword accesses will be carried out atomically if aligned on a 64-bit boundary, on a Pentium or newer, and unaligned (if still within a cache line) on a P6 or newer. You should use volatile to ensure that the compiler doesn't try to cache the write in a variable, and you may need to use a memory fence routine to ensure that the write happens in the proper order.

If you need to base the value written on an existing value, you should use your operating system's Interlocked features (e.g. Windows has InterlockedIncrement64).

Solution 4

On Intel MacOSX, you can use the built-in system atomic operations. There isn't a provided atomic get or set for either 32 or 64 bit integers, but you can build that out of the provided CompareAndSwap. You may wish to search XCode documentation for the various OSAtomic functions. I've written the 64-bit version below. The 32-bit version can be done with similarly named functions.

#include <libkern/OSAtomic.h>
// bool OSAtomicCompareAndSwap64Barrier(int64_t oldValue, int64_t newValue, int64_t *theValue);

void AtomicSet(uint64_t *target, uint64_t new_value)
{
    while (true)
    {
        uint64_t old_value = *target;
        if (OSAtomicCompareAndSwap64Barrier(old_value, new_value, target)) return;
    }
}

uint64_t AtomicGet(uint64_t *target)
{
    while (true)
    {
        int64 value = *target;
        if (OSAtomicCompareAndSwap64Barrier(value, value, target)) return value;
    }
}

Note that Apple's OSAtomicCompareAndSwap functions atomically perform the operation:

if (*theValue != oldValue) return false;
*theValue = newValue;
return true;

We use this in the example above to create a Set method by first grabbing the old value, then attempting to swap the target memory's value. If the swap succeeds, that indicates that the memory's value is still the old value at the time of the swap, and it is given the new value during the swap (which itself is atomic), so we are done. If it doesn't succeed, then some other thread has interfered by modifying the value in-between when we grabbed it and when we tried to reset it. If that happens, we can simply loop and try again with only minimal penalty.

The idea behind the Get method is that we can first grab the value (which may or may not be the actual value, if another thread is interfering). We can then try to swap the value with itself, simply to check that the initial grab was equal to the atomic value.

I haven't checked this against my compiler, so please excuse any typos.

You mentioned OSX specifically, but in case you need to work on other platforms, Windows has a number of Interlocked* functions, and you can search the MSDN documentation for them. Some of them work on Windows 2000 Pro and later, and some (particularly some of the 64-bit functions) are new with Vista. On other platforms, GCC versions 4.1 and later have a variety of __sync* functions, such as __sync_fetch_and_add(). For other systems, you may need to use assembly, and you can find some implementations in the SVN browser for the HaikuOS project, inside src/system/libroot/os/arch.

Solution 5

On X86, the fastest way to atomically write an aligned 64-bit value is to use FISTP. For unaligned values, you need to use a CAS2 (_InterlockedExchange64). The CAS2 operation is quite slow due to BUSLOCK though so it can often be faster to check alignment and do the FISTP version for aligned addresses. Indeed, this is how the Intel Threaded building Blocks implements Atomic 64-bit writes.

View more solutions

27,996

Author by

Adisak

I have been programming Video Games since I was a teenager in the 1980's and it's been my full-time job since the early 90's. I currently work on the Mortal Kombat Team at Netherrealm Studios in Chicago owned by WB Games (our studio was Midway Games Chicago until Warner Brothers purchased it in the summer of 2009). You can find me on facebook as well. Additionally I have an e-mail account at that well-known Google-Mail address with the same user name as on here.

Updated on April 07, 2020

Comments

Adisak about 4 years
When can 64-bit writes be guaranteed to be atomic, when programming in C on an Intel x86-based platform (in particular, an Intel-based Mac running MacOSX 10.4 using the Intel compiler)? For example:
```
unsigned long long int y;
y = 0xfedcba87654321ULL;
/* ... a bunch of other time-consuming stuff happens... */
y = 0x12345678abcdefULL;
```
If another thread is examining the value of y after the first assignment to y has finished executing, I would like to ensure that it sees either the value 0xfedcba87654321 or the value 0x12345678abcdef, and not some blend of them. I would like to do this without any locking, and if possible without any extra code. My hope is that, when using a 64-bit compiler (the 64-bit Intel compiler), on an operating system capable of supporting 64-bit code (MacOSX 10.4), that these 64-bit writes will be atomic. Is this always true?
Tim Post over 14 years

Thank you for such a warm and fuzzy place to send future 'practical' lock free evangelicals :) + 10 if I could.
Admin over 10 years

To be even more specific, it is stated in §8.8.1 on page 325.
Jeff Hammond about 9 years

Per software.intel.com/en-us/articles/…, CMPXCHG has not implied a bus lock since the Intel Pentium Pro processor.
Jeff Hammond about 9 years

mfence is overkill for a producer-consumer queue. You just need sfence on the producer side and lfence on the consumer side. There isn't an article entitled "Memory barriers considered harmful", but there should be :-)
Jeff Hammond about 9 years

If you use the right interface to atomics properly, you do not need volatile.
Jeff Hammond about 9 years

You suggest using GCC intrinsics, then say to not trust the compiler. Are you referring to something other than intrinsics that should not be trusted to the compiler?
Dima Rybachenko over 8 years

For reading you can use much simpler approach OSAtomicAdd64Barrier(0, target), it atomically adds 0 to variable pointed by target and returns the result of addition, in this case *target itself
Piotr Jurkiewicz over 5 years

§8.1.1 on page 258: intel.com/content/dam/www/public/us/en/documents/manuals/…
rustyx over 3 years

Using volatile for thread synchronization is wrong -- it does not guarantee atomicity and does not guarantee ordering. For example, ++ on a volatile is guaranteed to be non-atomic (this is what volatile actually requires). Teaching to use volatile for atomicity is ill advice.
Dai almost 3 years

Instead of a spinlock, why not use a memory-barrier? stackoverflow.com/questions/19965076/…