Aligning to cache line and knowing the cache line size
Solution 1
To know the sizes, you need to look it up using the documentation for the processor, afaik there is no programatic way to do it. On the plus side however, most cache lines are of a standard size, based on intels standards. On x86 cache lines are 64 bytes, however, to prevent false sharing, you need to follow the guidelines of the processor you are targeting (intel has some special notes on its netburst based processors), generally you need to align to 64 bytes for this (intel states that you should also avoid crossing 16 byte boundries).
To do this in C or C++ requires that you use the standard aligned_alloc
function or one of the compiler specific specifiers such as __attribute__((align(64)))
or __declspec(align(64))
. To pad between members in a struct to split them onto different cache lines, you need on insert a member big enough to align it to the next 64 byte boundery
Solution 2
I am using Linux and 8-core x86 platform. First how do I find the cache line size.
$ getconf LEVEL1_DCACHE_LINESIZE
64
Pass the value as a macro definition to the compiler.
$ gcc -DLEVEL1_DCACHE_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` ...
At run-time sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
can be used to get L1 cache size.
Solution 3
Another simple way is to just cat the /proc/cpuinfo:
grep cache_alignment /proc/cpuinfo
Solution 4
There's no completely portable way to get the cacheline size. But if you're on x86/64, you can call the cpuid
instruction to get everything you need to know about the cache - including size, cacheline size, how many levels, etc...
http://softpixel.com/~cwright/programming/simd/cpuid.php
(scroll down a little bit, the page is about SIMD, but it has a section getting the cacheline.)
As for aligning your data structures, there's also no completely portable way to do it. GCC and VS10 have different ways to specify alignment of a struct. One way to "hack" it is to pad your struct with unused variables until it matches the alignment you want.
To align your mallocs(), all the mainstream compilers also have aligned malloc functions for that purpose.
Solution 5
posix_memalign or valloc can be used to align allocated memory to a cache line.
MetallicPriest
Updated on July 05, 2022Comments
-
MetallicPriest almost 2 years
To prevent false sharing, I want to align each element of an array to a cache line. So first I need to know the size of a cache line, so I assign each element that amount of bytes. Secondly I want the start of the array to be aligned to a cache line.
I am using Linux and 8-core x86 platform. First how do I find the cache line size. Secondly, how do I align to a cache line in C. I am using the gcc compiler.
So the structure would be following for example, assuming a cache line size of 64.
element[0] occupies bytes 0-63 element[1] occupies bytes 64-127 element[2] occupies bytes 128-191
and so on, assuming of-course that 0-63 is aligned to a cache line.
-
MetallicPriest over 12 yearsBut how do I align to a cache line in c?
-
Necrolis over 12 years@MetallicPriest: updated my post a bit (note: there was an error in cache line size, align to 64 bytes, not 16, 16 bytes is to prevent splitting)
-
Sebastian Mach over 12 years@MetallicPriest: gcc and g++ both support
__attributes__
-
Steve Jessop over 12 yearsI know this is your own question, but for future readers you could answer both parts of it :-)
-
MetallicPriest over 12 yearsIs memory mapped by mmap, aligned too?
-
MetallicPriest over 12 yearsSteve, do you know if memory mapped by mmap is aligned to a cache line.
-
Necrolis over 12 years@MetallicPriest:
mmap
&VirtualAlloc
allocate page aligned memory, generally page granularity is 64kb (under windows), and since 64kb is a power of 64, it will be aligned properly. -
Steve Jessop over 12 yearsI don't think it's guaranteed by Posix, but I also wouldn't be in the least surprised if linux always selects addresses that are page-aligned, never mind just cache-line aligned. Posix says that if the caller specifies the first parameter (address hint), that has to be page-aligned, and the mapping itself is always a whole number of pages. That's strongly suggestive without actually guaranteeing anything.
-
tothphu almost 12 yearsYou can get the cache line size programatically. Check here. Also you can not generalize to having 64 byte cache lines on x86. It is only true for recent ones.
-
Necrolis almost 12 years@tothphu: a more portable way to get it is via
CPUID
, and as of many revisions of the Intel guides, cache lines have been 64 bytes, IIRC even the P4 (which is now ancient) had 64 byte cachelines (in fact, it did, see: osronline.com/article.cfm?article=273). also there is no need to spam the link, rather just edit your comment. -
tothphu almost 12 years@Necrolis I seem to remember that I have read 32 bytes somewhere in Core Duo timeframe, but then my memory is probaly deceiving me. Otherwise I couldn't edit the comment I have crossed some 5 min boundary.
-
jww over 8 years@James -
alignas
is C++11. Its not available for C++03. And it won't work on a number of Apple platforms. On some of their OSes, Apple provides and ancient C++ Standard Library that pretends to be C++11, but lacksunique_ptr
,alignas
, etc. -
Nick Strupat about 8 years@James also, the standard only requires
alignas
to support up to 16 bytes, so any higher value won't be portable. And since virtually all modern processors have a cache line size of 64 bytes,alignas
isn't useful unless you know your compiler supportsalignas(64)
. -
Brian Cain almost 7 yearsWhere are these
sysconf()
s specified? POSIX / IEEE Std 1003.1-20xx ? -
Maxim Egorushkin almost 7 years
-
Maxim Egorushkin almost 7 years@BrianCain I use Linux, so I just did
man sysconf
. Linux is not exactly POSIX compilant, so that Linux-specific documentation is often more useful. Sometimes it is out of date, so you justegrep -nH -r /usr/include -e '\b_SC'
. -
Dení over 6 yearsIn case of Mac, use
sysctl hw.cachelinesize
. -
Peter Cordes about 6 yearsYes,
mmap
only works in terms of pages, and pages are always larger than cache lines. Even in some theoretical weird architecture, there are extremely good reasons why cache lines won't be larger than pages (caches are normally physically tagged, so one line can't be split across 2 virtual pages without extreme pain for the CPU designers). -
NoSenseEtAl over 5 yearsC++11 addes alignas that is portable way of specifying alignment
-
Alnitak over 5 years
alignas
is also in C11, not just C++11. -
Carlo Wood almost 5 years@NoSenseEtAl
alignas
officially only supports alignment up till the size of the typestd::max_align_t
, which is typically the alignment requirement of along double
, aka 8 or 16 bytes - not 64 unfortunately. See for example stackoverflow.com/questions/49373287/… -
Carlo Wood almost 5 years
alignas
officially only supports alignment up till the size of the typestd::max_align_t
, which is typically the alignment requirement of along double
, aka 8 or 16 bytes - not 64 unfortunately. -
Carlo Wood almost 5 years@NickStrupat It seems that support for alignment to cache line sizes has finally been added to C++17. My last comment seems also not to be correct anymore for C++17 (the problem was merely that operator new would not guaranteed return memory aligned better than std::max_align_t). I just found this: en.cppreference.com/w/cpp/thread/…
-
Nick Strupat almost 5 years@CarloWood You're right about the C++17 addition. The only advantage remaining for my library and its underlying
get_cachline_size
function is that it can retrieve that information at run-time. The downside is that you lose possible compiler optimizations if the cache line size is known at compile time. -
Carlo Wood almost 5 years@NickStrupat After posting this comment, I tried it out and discovered that neither gcc nor clang support it... Apparently they went for option 3 in lists.llvm.org/pipermail/cfe-dev/2018-May/058138.html (I read the whole thread; it's long but to summarize -- they have no clue how to implement it and were thinking about filing a Defect Report). Nevertheless, your library will of course have the exact same ABI/ODR issues. I'm starting to feel that simply using 64 bytes everywhere for now is my best option :/.
-
maxschlepzig over 4 yearsPerhaps you want to remove a useless use of cat.
-
Peter Cordes over 4 years@CarloWood: Compilers are allowed to support over-aligned types, and in practice they do. (all of gcc, clang, MSVC, ICC support
alignas(64)
). True that ISO C++ only requiresalignas
up toalignof(max_align_t)
, but it also doesn't specify__declspec
or__attribute__
. I'd callalignas
portable because in real life compilers can and do support it because it's useful. Not in the same sense that behaviour required by ISO C++ is portable, sure. -
Peter Cordes over 4 years@Necrolis: re: earlier comments: x86 (and x86-64) page size is 4kiB. x86-64 hugepages are 2MiB or 1GiB. Yes, everything uses 64-byte cache lines since Core 2 at least, so all x86-64. Pentium II/III did use 32-byte lines, maybe even Pentium M / Core solo/duo. Over-aligning might waste a bit of space on those ancient CPUs, but it's not a big deal. On modern CPUs, L2 spatial prefetch tries to complete an aligned pair of cache lines (128 bytes) so it can sometimes make sense to align by 128.
-
Peter Cordes over 4 yearsUsually it's so much better to have a compile-time-constant line size that I'd rather hard-code 64 than call
sysconf
. The compiler won't even know it's a power of 2, so you'll have to manually do stuff likeoffset = ptr & (linesize-1)
for remainder or bit-scan + right-shift to implement division. You can't just use/
in code that's performance-sensitive. -
Peter Cordes over 4 yearsCompilers are reluctant to implement
hardware_destructive_interference_size
because you really want it to be a compile-time-constant, but it can't always be if you're compiling for a "generic" target that could run on multiple CPUs of the same ISA. A conservative choice would be possible but not guaranteed future-proof. (Like 128 bytes to account for current x86 CPU with 64-byte lines and an L2 spatial prefetch that likes to complete an aligned pair of lines. (mainstream Intel)) -
ilstam almost 4 yearsBut if you used a cross compiler that wouldn't work right? Because it would get the cache line size of you current architecture and not the one of your target architecture.
-
Maxim Egorushkin almost 4 years@ilstam When cross-compiling you would need to obtain that
getconf LEVEL1_DCACHE_LINESIZE
from your target architecture, sure. Your build system might provide it, or you'd have to hardcode it as a system-specific value into your build system. -
Maxim Egorushkin almost 4 years@ilstam Another method is to have arch-specific implementations in different shared libraries and load the right one at run-time. Or, more advanced users, could have their own mechanisms of using arch-specific functions, but one would need to be an expert with all the details involved (which isn't rocket science, but requires a bit of thorough reading and appreciation).