Should I worry about the alignment during pointer casting?

16,938

Solution 1

1. Is it REALLY safe to dereference the pointer after casting in a real project?

If the pointer happens to not be aligned properly it really can cause problems. I've personally seen and fixed bus errors in real, production code caused by casting a char* to a more strictly aligned type. Even if you don't get an obvious error you can have less obvious issues like slower performance. Strictly following the standard to avoid UB is a good idea even if you don't immediately see any problems. (And one rule the code is breaking is the strict aliasing rule, § 3.10/10*)

A better alternative is to use std::memcpy() or std::memmove if the buffers overlap (or better yet bit_cast<>())

unsigned char data[16];
int i1, i2, i3, i4;
std::memcpy(&i1, data     , sizeof(int));
std::memcpy(&i2, data +  4, sizeof(int));
std::memcpy(&i3, data +  8, sizeof(int));
std::memcpy(&i4, data + 12, sizeof(int));

Some compilers work harder than others to make sure char arrays are aligned more strictly than necessary because programmers so often get this wrong though.

#include <cstdint>
#include <typeinfo>
#include <iostream>

template<typename T> void check_aligned(void *p) {
    std::cout << p << " is " <<
      (0==(reinterpret_cast<std::intptr_t>(p) % alignof(T))?"":"NOT ") <<
      "aligned for the type " << typeid(T).name() << '\n';
}

void foo1() {
    char a;
    char b[sizeof (int)];
    check_aligned<int>(b); // unaligned in clang
}

struct S {
    char a;
    char b[sizeof(int)];
};

void foo2() {
    S s;
    check_aligned<int>(s.b); // unaligned in clang and msvc
}

S s;

void foo3() {
    check_aligned<int>(s.b); // unaligned in clang, msvc, and gcc
}

int main() {
    foo1();
    foo2();
    foo3();
}

http://ideone.com/FFWCjf

2. Is there any difference between C-style casting and reinterpret_cast?

It depends. C-style casts do different things depending on the types involved. C-style casting between pointer types will result in the same thing as a reinterpret_cast; See § 5.4 Explicit type conversion (cast notation) and § 5.2.9-11.

3. Is there any difference between C and C++?

There shouldn't be as long as you're dealing with types that are legal in C.


* Another issue is that C++ does not specify the result of casting from one pointer type to a type with stricter alignment requirements. This is to support platforms where unaligned pointers cannot even be represented. However typical platforms today can represent unaligned pointers and compilers specify the results of such a cast to be what you would expect. As such, this issue is secondary to the aliasing violation. See [expr.reinterpret.cast]/7.

Solution 2

It's not alright, really. The alignment may be wrong, and the code may violate strict aliasing. You should unpack it explicitly.

i1 = data[0] | data[1] << 8 | data[2] << 16 | data[3] << 24;

etc. This is definitely well-defined behaviour, and as a bonus, it's also endianness-independent, unlike your pointer cast.

Solution 3

In the example you're showing here what you do will be safe on almost all modern CPUs iff the initial char pointer is correctly aligned. In general this is not safe and not guaranteed to work.

If the initial char pointer is not correctly aligned, this will work on x86 and x86_64, but can fail on other architectures. If you're lucky it will just give you a crash and you'll fix your code. If you're unlucky, the unaligned access will be fixed up by a trap handler in your operating system and you'll have terrible performance without having any obvious feedback on why it is so slow (we're talking glacially slow for some code, this was a huge problem on alpha 20 years ago).

Even on x86 & co, unaligned access will be slower.

If you want to be safe today and in the future, just memcpy instead of doing the assignment like this. A modern complier will likely have optimizations for memcpy and do the right thing and if not, memcpy itself will have alignment detection and will do the fastest thing.

Also, your example is wrong on one point: sizeof(int) isn't always 4.

Solution 4

The correct way to unpack char buffered data is to use memcpy:

unsigned char data[4 * sizeof(int)];
int i1, i2, i3, i4;
memcpy(&i1, data, sizeof(int));
memcpy(&i2, data + sizeof(int), sizeof(int));
memcpy(&i3, data + 2 * sizeof(int), sizeof(int));
memcpy(&i4, data + 3 * sizeof(int), sizeof(int));

Casting violates aliasing, which means that the compiler and optimiser are free to treat the source object as uninitialised.

Regarding your 3 questions:

  1. No, dereferencing a cast pointer is in general unsafe, because of aliasing and alignment.
  2. No, in C++, C-style casting is defined in terms of reinterpret_cast.
  3. No, C and C++ agree on cast-based aliasing. There is a difference in the treatment of union-based aliasing (C allows it in some cases; C++ does not).

Solution 5

Update: I overlooked the fact that indeed smaller types may be unaligned relatively to a larger one, like it may be in your example. You can aleviate that issue by reversing the way you cast your array : declare your array as an array of int, and cast it to char * when you need to access it that way.

// raw data consists of 4 ints
int data[4];

// here's the char * to the original data
char *cdata = (char *)data;
// now we can recast it safely to int *
i1 = *((int*)cdata);
i2 = *((int*)(cdata + sizeof(int)));
i3 = *((int*)(cdata + sizeof(int) * 2));
i4 = *((int*)(cdata + sizeof(int) * 3));

There won't be any issue on array of primitives types. The issues of alignment occur when dealing with arrays of structured data (struct in C), if the original primitve type of the array is larger than the one it is casted to, see the update above.

It should be perfectly ok to cast an array of char to an array of int, provided you replace the offset of 4 with sizeof(int), to match the size of int on the platform the code is supposed to run on.

// raw data consists of 4 ints
unsigned char data[4 * sizeof(int)];
int i1, i2, i3, i4;
i1 = *((int*)data);
i2 = *((int*)(data + sizeof(int)));
i3 = *((int*)(data + sizeof(int) * 2));
i4 = *((int*)(data + sizeof(int) * 3));

Note that you will get endianness issues only if you share that data somehow from one platform to another with a different byte ordering. Otherwise, it should be perfectly fine.

Share:
16,938
Eric Z
Author by

Eric Z

Senior software engineer @National Instruments since March, 2008. The primary programming languages are C and C++. My current job is to develop industrial network protocols based on Ethernet.

Updated on June 06, 2022

Comments

  • Eric Z
    Eric Z almost 2 years

    In my project we have a piece of code like this:

    // raw data consists of 4 ints
    unsigned char data[16];
    int i1, i2, i3, i4;
    i1 = *((int*)data);
    i2 = *((int*)(data + 4));
    i3 = *((int*)(data + 8));
    i4 = *((int*)(data + 12));
    

    I talked to my tech lead that this code may not be portable since it's trying to cast a unsigned char* to a int* which usually has a more strict alignment requirement. But tech lead says that's all right, most compilers remains the same pointer value after casting, and I can just write the code like this.

    To be frank, I'm not really convinced. After researching, I find some people against use of pointer castings like above, e.g., here and here.

    So here are my questions:

    1. Is it REALLY safe to dereference the pointer after casting in a real project?
    2. Is there any difference between C-style casting and reinterpret_cast?
    3. Is there any difference between C and C++?