Why is it allowed to cast a pointer to a reference?
Solution 1
Well, that's the purpose of reinterpret_cast
! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast
you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x)
is the same thing as *reinterpret_cast<T*>(&x)
.
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x)
interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double
, a pointer, an int
or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast
in this context, so the above immediately applies to it.
In your second example you attached reference c
to the memory occupied by pointer variable pc
. When you did c = 'B'
, you forcefully wrote the value 'B'
into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.
Solution 2
It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t)
is equivalent to *reinterpret_cast<U*>(&t)
.
In our case, U
is char
, and t
is char*
.
Expanding those, we see that the following happens:
- we take the address of the argument to the cast, yielding a value of type
char**
. - we
reinterpret_cast
this value tochar*
- we dereference the result, yielding a
char
lvalue.
reinterpret_cast
allows you to cast from any pointer type to any other pointer type. And so, a cast from char**
to char*
is well-formed.
Solution 3
I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
- C didn't have reference types, it only had values and pointers,
since, physically in memory, we only have values and pointers. - In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x;
and int& y{x};
then we really have a int* y_ptr = &x
; and if we say y = 123;
we merely mean *(y_ptr) = 123;
. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;
) what actually happens is that a
is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2
.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>()
.
PS - std::ref
is also really just a pointer when you drill down into it.
Related videos on Youtube
Xeo
Game Programmer, Bookworm, Japanese Culture Fanatic, C++ Lover and Template Hacker. I'm usually found hanging out in the Lounge, where the cool kids are. Sanity is just a mask.
Updated on May 31, 2022Comments
-
Xeo almost 2 years
Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or
reinterpret_cast
?int main() { char c = 'A'; char* pc = &c; char& c1 = (char&)pc; char& c2 = reinterpret_cast<char&>(pc); }
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream> int main() { char c = 'A'; char* pc = &c; char& c1 = (char&)pc; std::cout << *pc << "\n"; c1 = 'B'; std::cout << *pc << "\n"; }
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?
-
Xeo almost 13 yearsYes, we already got that from the linked answer. I'm searching for the reason why and how it works, especially the second example.
-
josesuero almost 13 yearsBut why? I figured it out after some 15 minutes of poring over the standard, with @litb trying to explain it to me. But disregarding any notion of "purpose", where does it say that this cast is well-formed?
-
AnT stands with Russia almost 13 years@jalf: It says so in 5.2.10/10. Any lvalue can be converted to any other reference type.
-
James McNellis almost 13 yearsI did not know about the
reinterpret_cast<T&>(x) <==> *reinterpret_cast<T*>(&x)
equivalence. Interesting. -
josesuero almost 13 yearsC++ doesn't allow pretty much anything when you cast. Try casting an
int
tostd::string
. You can't. -
Xeo almost 13 yearsWell, this concludes it pretty much, thanks for such an exhaustive answer! :)
-
usta almost 13 yearsEquivalent to
*reinterpret_cast<U*>(&t)
with the built-in/non-overloaded meaning of&
and*
operators, as emphasized in 5.2.10/10.boost::addressof
takes advantage of that fine distinction. -
Winston Ewert almost 13 years@jaff, ok true. But it doesn't restrict you to only casts that make sense.
-
Lightness Races in Orbit over 12 yearsFYI,
[C++03: 5.2.10/10]
is[C++11: 5.4.10/11]
. -
Danke Xie over 8 yearsBeware of converting a jlong value to a C++ reference. I had a hard time to find the bug: reinterpret_cast<T&>(x) where x is a jlong value in JNI code, and I have to convert it to reinterpret_cast<T*>(x). They are not always identical. cast to "T*" is safer than T&.
-
underscore_d over 8 years"As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type." But is it, though? As far as I can tell from the Standard, it only defines behaviour for converting pointers to & back from a different type - not dereferencing & reinterpreting the referred values. What am I missing? Fwiw,
cppreference.com
also claims this same "reinterpret the bit pattern" function, which is what everyone's intuition suggests - & how the major compilers handle this (albeit I can't find their documentation on this) - but I couldn't find a basis for it in any std.