How to determine if memory is aligned?

c optimization memory sse simd

52,809

Solution 1

EDIT: casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

As pointed out in the comments below, there are better solutions if you are willing to include a header...

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p & 15) == 0.

Solution 2

#define is_aligned(POINTER, BYTE_COUNT) \
    (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)

The cast to void * (or, equivalenty, char *) is necessary because the standard only guarantees an invertible conversion to uintptr_t for void *.

If you want type safety, consider using an inline function:

static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count)
{ return (uintptr_t)pointer % byte_count == 0; }

and hope for compiler optimizations if byte_count is a compile-time constant.

Why do we need to convert to void * ?

The C language allows different representations for different pointer types, eg you could have a 64-bit void * type (the whole address space) and a 32-bit foo * type (a segment).

The conversion foo * -> void * might involve an actual computation, eg adding an offset. The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop.

For such an implementation, foo * -> uintptr_t -> foo * would work, but foo * -> uintptr_t -> void * and void * -> uintptr_t -> foo * wouldn't. The alignment computation would also not work reliably because you only check alignment relative to the segment offset, which might or might not be what you want.

In conclusion: Always use void * to get implementation-independant behaviour.

Solution 3

Other answers suggest an AND operation with low bits set, and comparing to zero.

But a more straight-forward test would be to do a MOD with the desired alignment value, and compare to zero.

#define ALIGNMENT_VALUE     16u

if (((uintptr_t)ptr % ALIGNMENT_VALUE) == 0)
{
    // ptr is aligned
}

Solution 4

With a function template like

#include <type_traits>

template< typename T >
bool is_aligned(T* p){
    return !(reinterpret_cast<uintptr_t>(p) % std::alignment_of<T>::value);
}

you could check alignment at runtime by invoking something like

struct foo_type{ int bar; }foo;
assert(is_aligned(&foo)); // passes

To check that bad alignments fail, you could do

// would almost certainly fail
assert(is_aligned((foo_type*)(1 + (uintptr_t)(&foo)));

Solution 5

This is basically what I'm using. By making the integer a template, I ensure it's expanded compile time, so I won't end up with a slow modulo operation whatever I do.

I always like checking my input, so hence the compile time assertion. If your alignment value is wrong, well then it won't compile...

template <unsigned int alignment>
struct IsAligned
{
    static_assert((alignment & (alignment - 1)) == 0, "Alignment must be a power of 2");

    static inline bool Value(const void * ptr)
    {
        return (((uintptr_t)ptr) & (alignment - 1)) == 0;
    }
};

To see what's going on, you can use this:

// 1 of them is aligned...
int* ptr = new int[8];
for (int i = 0; i < 8; ++i)
    std::cout << IsAligned<32>::Value(ptr + i) << std::endl;

// Should give '1'
int* ptr2 = (int*)_aligned_malloc(32, 32);
std::cout << IsAligned<32>::Value(ptr2) << std::endl;

View more solutions

52,809

user229898

Updated on July 09, 2020

Comments

user229898 almost 4 years
I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:
```
void sse_func(const float* const ptr, int len){
    if( ptr is aligned )
    {
        for( ... ){
            // unroll loop by 4 or 2 elements
        }
        for( ....){
            // handle the rest
            // (non-optimized code)
        }
    } else {
        for( ....){
            // regular C code to handle non-aligned memory
        }
    }
}
```
However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

Thank you in advance...
- Rehno Lindeque over 14 years
  
  random-name, not sure but I think it might be more efficient to simply handle the first few 'unaligned' elements separately like you do with the last few. Then you can still use SSE for the 'middle' ones...
- user229898 over 14 years
  
  Hm, this is a good point. I'll try it. Thanks!
- Peter Cordes over 6 years
  
  Better: use a scalar prologue to handle the misaligned elements up to the first alignment boundary. (gcc does this when auto-vectorizing with a pointer of unknown alignment.) Or if your algorithm is idempotent (like a[i] = foo(b[i])), do a potentially-unaligned first vector, then the main loop starting at the first alignment boundary after the first vector, then a final vector that ends at the last element. If the array was in fact misaligned and/or the count wasn't a multiple of the vector width, then some of those vectors will overlap, but that still beats scalar.
- jww over 5 years
  
  Best: supply an allocator that provides 16-byte aligned memory. Then operate on the 16-byte aligned buffer without the need to fixup leading or tail elements. This is what libraries like Botan and Crypto++ do for algorithms which use SSE, Altivec and friends.
user229898 over 14 years

I think casting a pointer to int is a bad idea? My code will be compiled on both x86 and x64 systems. I hoped there would be some secret system macro is_aligned_mem() or so.
Anon. over 14 years

You could instead use uintptr_t - it is guaranteed the correct size to hold a pointer. Provided that your compiler defines it, of course.
Paul Nathan over 14 years

No, a pointer is an int. It just isn't used as a numeric generally.
Richard Pennington over 14 years

It doesn't really matter if the pointer and integer sizes don't match. You only care about the bottom few bits.
Bill Forster over 14 years

Well if there was a secret system macro you can be sure that it will work by casting the pointer to int. There is nothing magic going on with this cast, you are just asking the compiler to let you look at how the pointer is represented in bits. If you don't do that, how can you ever know if it is aligned ?
Hasturkun over 14 years

I would usually use p % 16 == 0, as compilers usually know the powers of 2 just as well as I do, and I find this more readable
Paul Nathan over 14 years

int traditionally was the size of the system word, aka a pointer. Is that changing in the 32-bit to 64-bit transition? (curious)
Pascal Cuoq over 14 years

@Hasturkun Division/modulo over signed integers are not compiled in bitwise tricks in C99 (some stupid round-towards-zero stuff), and it's a smart compiler indeed that will recognize that the result of the modulo is being compared to zero (in which case the bitwise stuff works again). Not impossible, but not trivial. Generally speaking, better cast to unsigned integer if you want to use % and let the compiler compile &.
Steve Jessop over 14 years

No, you can't. A pointer is not a valid argument to the & operator.
user229898 over 14 years

Thanks for all the answers. @Richard Pennington: That's a good point. @Bill Forster: I know someone has eventually to compare the actual bits but I wanted a safe and cross-platform (x86, x64) way. It scares me a bit that there are so many self-made solutions. And I have not found the recommended one on MSDN or at Intel's website.
Pascal Cuoq over 14 years

I upvoted you, but only because you are using unsigned integers :)
user229898 over 14 years

@Paus Nathan: It depends if you have a ILP64 or LP64 x64 system. E. g. Windows on x64 architecture is LP64, that means an int is still 32-Bit but long has 64 bits. I am not sure about Linux on x64 though.
Hasturkun over 14 years

@Pascal Cuoq, gcc notices this and emits the exact same code for (p & 15) == 0 and (p % 16) == 0 with the -O flag set. I have seen a number of other compilers that recognize integer division/modulus/multiplication by a power of 2 and do the smart thing about it. (I do agree about casting to unsigned though)
Hasturkun over 14 years

of course, the compiler can only recognize these when dealing with a compile time constant. if you find yourself using multiple possible values, fall back to using &
Pascal Cuoq over 14 years

@Hasturkun I just compiled int d(int x) { return x / 8; } with gcc -S. It is both beautiful and sad... Mostly sad...
Hasturkun over 14 years

@Pascal Cuoq: I do agree about that, but it still handles the modulus and compare to 0 correctly (so long as the optimizer is being used, otherwise may emit the modulus (which it doesn't in my case, but does this far less efficiently).
user229898 over 14 years

This macro looks really nasty and sophisticated at once. I will definitely test it.
Exectron over 13 years

Please provide any examples you know of platforms in which non-void * does not produce an integer value in the range of uintptr_t. And/or, do you know what the rationale is for the standard to be worded that way?
Danny Staple about 9 years

It would be good here to explain how this works so the OP understands it.
milleniumbug almost 9 years

-1 Doesn't answer the question. (the question was "How to determine if memory is aligned?", not "how to allocate some aligned memory?")
Mikhail over 8 years

Why restrict?, looks like it doesn't do anything when there is only one pointer?
Christoph over 8 years

@Mikhail: the combination of const * with restrict is a stronger guarantee than plain const *: without restrict, it is legal to cast away the const and modify the memory; with restrict present, it is not; sadly, I learned that this isn't useful in practice as it only comes into effect if the pointer is actually used, which the caller can't assume in general (ie the usefulness lies solely on side of the callee); in this particular case, it's superfluous anyway as we're dealing with an inline function, so the compiler can see its body and infer on its own that no memory gets modified
Paweł Bylica over 7 years

C++ explicitly forbids creating unaligned pointers to given type T. Because such pointer is not allowed to exist the compiler is allowed to optimize is_aligned(p) to true for any pointer p.
rubicks over 7 years

@paweł-bylica, you're probably correct. Could you provide a reference (document, chapter, verse, etc.) so I can amend my answer?
Admin over 7 years

@milleniumbug he does align it in the second line
milleniumbug over 7 years

@MarkYisri It's also not "how to align a buffer?"
Admin over 7 years

@milleniumbug doesn't matter whether it's a buffer or not. mem is a pointer.
Admin over 7 years

@SteveJessop you could cast to uintptr_t.
milleniumbug over 7 years

@MarkYisri It's also not "how to align a pointer?". The answer to "is mem aligned?" is not a pointer. It's "yes" or "no".
Steve Jessop over 7 years

@MarkYisri: yes, I expect that in practice, every implementation that supports SSE2 instructions provides an implementation-specific guarantee that'll work :-)
gnzlbg over 6 years

Also template functions are always inline, so the inline keyword is redundant.
Jarrod Smith over 6 years

But we can't infer the original alignment of the pointer, only the maximum alignment. i.e. ((unsigned long)p & 15) == 0 could hold true for pointers that were originally requested to be 4 or 8-byte aligned.
rubicks over 6 years

@gnzlbg, I don't think function templates are always inline; at least not according to this: stackoverflow.com/a/10536588/3798657.
gnzlbg over 6 years

That answer says that inline makes a difference on explicit specializations, but explicit specializations are not templates. The second answer on that page is correct: stackoverflow.com/a/10535711/1422197 Basically, if you were to explicitly specialize this template into a function, then, depending on where you decide to specialize it (e.g. a header file), you might need to use the inline keyword on the specialization to avoid ODR issues, but this is always the case independently of whether you use inline on the template or not. inline on the template is completely irrelevant.
rubicks over 6 years

@gnzlbg, I concede; you are correct. I'll change my answer forthwith.
Peter Cordes over 6 years

@Anon.: You only need to check the low bits of the pointer anyway, so it's ok to lose the high bits when casting to a narrow unsigned type. It's important to use uintptr_t if you want to cast back to a pointer after rounding down or up to the next alignment boundary, though.
jww over 5 years

I believe this fails with uint8_t types, which sometimes have alignment requirements of 1.
Exectron over 5 years

@jww I'm not sure I understand what you mean. An alignment requirement of 1 would mean essentially no alignment requirement. There's no need to worry about alignment of uint8_t. But please clarify if I'm misunderstanding.
mwfearnley about 5 years

If a float * can (theoretically) have a different representation from a void *, does that mean the alignment check could be happening on a different value from what was intended?
Todd Lehman over 4 years

Does 16u provide a portability advantage that 16 does not?
Exectron over 4 years

The u suffix on the integer makes it unsigned. It's good to avoid mixing signed and unsigned in expressions, to avoid some possible gotchas that can happen with mixed-sign arithmetic. See GCC warning "comparison between signed and unsigned integer expressions". It probably doesn't matter in this case, but it's good to get into good habits. (I suppose the 0 should be 0u too)
rez over 2 years

Take note that you shouldn't use a real MOD operation, it's quite an expensive operation and should be avoided as much as possible. You should always use the and operation. But I believe if you have an enough sophisticated compiler with all the optimization options enabled it'll automatically convert your MOD operation to a single and opcode. (Linux kernel uses and operation too fyi)