Why is it allowed to cast a pointer to a reference?

17,437

Solution 1

Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.

This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).

The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.

As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.

In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.

One can rewrite your second program into an equivalent program without references

int main(){
    char* pc = new char('A');
    char* c = (char *) &pc;
    std::cout << *pc << "\n";
    *c = 'B';
    std::cout << *pc << "\n";
}

From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.

Solution 2

It took me a while to grok it, but I think I finally got it.

The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).

In our case, U is char, and t is char*.

Expanding those, we see that the following happens:

  • we take the address of the argument to the cast, yielding a value of type char**.
  • we reinterpret_cast this value to char*
  • we dereference the result, yielding a char lvalue.

reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.

Solution 3

I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.

  • C didn't have reference types, it only had values and pointers,
    since, physically in memory, we only have values and pointers.
  • In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.

Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.

(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)

Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().

PS - std::ref is also really just a pointer when you drill down into it.

Share:
17,437

Related videos on Youtube

Xeo
Author by

Xeo

Game Programmer, Bookworm, Japanese Culture Fanatic, C++ Lover and Template Hacker. I'm usually found hanging out in the Lounge, where the cool kids are. Sanity is just a mask.

Updated on May 31, 2022

Comments

  • Xeo
    Xeo almost 2 years

    Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?

    int main() {
        char  c  = 'A';
        char* pc = &c;
    
        char& c1 = (char&)pc;
        char& c2 = reinterpret_cast<char&>(pc);
    }
    

    The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.


    My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:

    #include <iostream>
    
    int main() {
        char  c  = 'A';
        char* pc = &c;
    
        char& c1 = (char&)pc;
        std::cout << *pc << "\n";
    
        c1 = 'B';
        std::cout << *pc << "\n";
    }
    

    With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.

    Ideas? Explanations? Standard quotes?

  • Xeo
    Xeo almost 13 years
    Yes, we already got that from the linked answer. I'm searching for the reason why and how it works, especially the second example.
  • josesuero
    josesuero almost 13 years
    But why? I figured it out after some 15 minutes of poring over the standard, with @litb trying to explain it to me. But disregarding any notion of "purpose", where does it say that this cast is well-formed?
  • AnT stands with Russia
    AnT stands with Russia almost 13 years
    @jalf: It says so in 5.2.10/10. Any lvalue can be converted to any other reference type.
  • James McNellis
    James McNellis almost 13 years
    I did not know about the reinterpret_cast<T&>(x) <==> *reinterpret_cast<T*>(&x) equivalence. Interesting.
  • josesuero
    josesuero almost 13 years
    C++ doesn't allow pretty much anything when you cast. Try casting an int to std::string. You can't.
  • Xeo
    Xeo almost 13 years
    Well, this concludes it pretty much, thanks for such an exhaustive answer! :)
  • usta
    usta almost 13 years
    Equivalent to *reinterpret_cast<U*>(&t) with the built-in/non-overloaded meaning of & and * operators, as emphasized in 5.2.10/10. boost::addressof takes advantage of that fine distinction.
  • Winston Ewert
    Winston Ewert almost 13 years
    @jaff, ok true. But it doesn't restrict you to only casts that make sense.
  • Lightness Races in Orbit
    Lightness Races in Orbit over 12 years
    FYI, [C++03: 5.2.10/10] is [C++11: 5.4.10/11].
  • Danke Xie
    Danke Xie over 8 years
    Beware of converting a jlong value to a C++ reference. I had a hard time to find the bug: reinterpret_cast<T&>(x) where x is a jlong value in JNI code, and I have to convert it to reinterpret_cast<T*>(x). They are not always identical. cast to "T*" is safer than T&.
  • underscore_d
    underscore_d over 8 years
    "As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type." But is it, though? As far as I can tell from the Standard, it only defines behaviour for converting pointers to & back from a different type - not dereferencing & reinterpreting the referred values. What am I missing? Fwiw, cppreference.com also claims this same "reinterpret the bit pattern" function, which is what everyone's intuition suggests - & how the major compilers handle this (albeit I can't find their documentation on this) - but I couldn't find a basis for it in any std.