What are the rules for casting pointers in C?
Solution 1
When thinking about pointers, it helps to draw diagrams. A pointer is an arrow that points to an address in memory, with a label indicating the type of the value. The address indicates where to look and the type indicates what to take. Casting the pointer changes the label on the arrow but not where the arrow points.
d
in main
is a pointer to c
which is of type char
. A char
is one byte of memory, so when d
is dereferenced, you get the value in that one byte of memory. In the diagram below, each cell represents one byte.
-+----+----+----+----+----+----+-
| | c | | | | |
-+----+----+----+----+----+----+-
^~~~
| char
d
When you cast d
to int*
, you're saying that d
really points to an int
value. On most systems today, an int
occupies 4 bytes.
-+----+----+----+----+----+----+-
| | c | ?₁ | ?₂ | ?₃ | |
-+----+----+----+----+----+----+-
^~~~~~~~~~~~~~~~~~~
| int
(int*)d
When you dereference (int*)d
, you get a value that is determined from these four bytes of memory. The value you get depends on what is in these cells marked ?
, and on how an int
is represented in memory.
A PC is little-endian, which means that the value of an int
is calculated this way (assuming that it spans 4 bytes):
* ((int*)d) == c + ?₁ * 2⁸ + ?₂ * 2¹⁶ + ?₃ * 2²⁴
. So you'll see that while the value is garbage, if you print in in hexadecimal (printf("%x\n", *n)
), the last two digits will always be 35
(that's the value of the character '5'
).
Some other systems are big-endian and arrange the bytes in the other direction: * ((int*)d) == c * 2²⁴ + ?₁ * 2¹⁶ + ?₂ * 2⁸ + ?₃
. On these systems, you'd find that the value always starts with 35
when printed in hexadecimal. Some systems have a size of int
that's different from 4 bytes. A rare few systems arrange int
in different ways but you're extremely unlikely to encounter them.
Depending on your compiler and operating system, you may find that the value is different every time you run the program, or that it's always the same but changes when you make even minor tweaks to the source code.
On some systems, an int
value must be stored in an address that's a multiple of 4 (or 2, or 8). This is called an alignment requirement. Depending on whether the address of c
happens to be properly aligned or not, the program may crash.
In contrast with your program, here's what happens when you have an int
value and take a pointer to it.
int x = 42;
int *p = &x;
-+----+----+----+----+----+----+-
| | x | |
-+----+----+----+----+----+----+-
^~~~~~~~~~~~~~~~~~~
| int
p
The pointer p
points to an int
value. The label on the arrow correctly describes what's in the memory cell, so there are no surprises when dereferencing it.
Solution 2
char c = '5'
A char
(1 byte) is allocated on stack at address 0x12345678
.
char *d = &c;
You obtain the address of c
and store it in d
, so d = 0x12345678
.
int *e = (int*)d;
You force the compiler to assume that 0x12345678
points to an int
, but an int is not just one byte (sizeof(char) != sizeof(int)
). It may be 4 or 8 bytes according to the architecture or even other values.
So when you print the value of the pointer, the integer is considered by taking the first byte (that was c
) and other consecutive bytes which are on stack and that are just garbage for your intent.
Solution 3
Casting pointers is usually invalid in C. There are several reasons:
Alignment. It's possible that, due to alignment considerations, the destination pointer type is not able to represent the value of the source pointer type. For example, if
int *
were inherently 4-byte aligned, castingchar *
toint *
would lose the lower bits.Aliasing. In general it's forbidden to access an object except via an lvalue of the correct type for the object. There are some exceptions, but unless you understand them very well you don't want to do it. Note that aliasing is only a problem if you actually dereference the pointer (apply the
*
or->
operators to it, or pass it to a function that will dereference it).
The main notable cases where casting pointers is okay are:
When the destination pointer type points to character type. Pointers to character types are guaranteed to be able to represent any pointer to any type, and successfully round-trip it back to the original type if desired. Pointer to void (
void *
) is exactly the same as a pointer to a character type except that you're not allowed to dereference it or do arithmetic on it, and it automatically converts to and from other pointer types without needing a cast, so pointers to void are usually preferable over pointers to character types for this purpose.When the destination pointer type is a pointer to structure type whose members exactly match the initial members of the originally-pointed-to structure type. This is useful for various object-oriented programming techniques in C.
Some other obscure cases are technically okay in terms of the language requirements, but problematic and best avoided.
Solution 4
I suspect you need a more general answer:
There are no rules on casting pointers in C! The language lets you cast any pointer to any other pointer without comment.
But the thing is: There is no data conversion or whatever done! Its solely your own responsibilty that the system does not misinterpret the data after the cast - which would generally be the case, leading to runtime error.
So when casting its totally up to you to take care that if data is used from a casted pointer the data is compatible!
C is optimized for performance, so it lacks runtime reflexivity of pointers/references. But that has a price - you as a programmer have to take better care of what you are doing. You have to know on your self if what you want to do is "legal"
Solution 5
You have a pointer to a char
. So as your system knows, on that memory address there is a char
value on sizeof(char)
space. When you cast it up to int*
, you will work with data of sizeof(int)
, so you will print your char and some memory-garbage after it as an integer.
Related videos on Youtube
Theo Chronic
Updated on January 29, 2020Comments
-
Theo Chronic over 4 years
K&R doesn't go over it, but they use it. I tried seeing how it'd work by writing an example program, but it didn't go so well:
#include <stdio.h> int bleh (int *); int main(){ char c = '5'; char *d = &c; bleh((int *)d); return 0; } int bleh(int *n){ printf("%d bleh\n", *n); return *n; }
It compiles, but my print statement spits out garbage variables (they're different every time I call the program). Any ideas?
-
SheetJS almost 11 yearsint has a larger size than char, so it's reading beyond the space of the '5' char. Try doing the same thing using a smaller data type (int c, printf "%c")
-
Montre almost 11 yearsThe value of
*n
will be anint
, which should be 4 bytes.*n
points to the local variablec
inmain()
. This means you'll be writing out the value of'c'
and whatever three bytes follow it in memory. (My guess is the value ofd
.) You can verify this by writing out the number in hex - two of the digits should be the same every time. -
mah almost 11 years
'5'
-- you mught think this looks like an int since it appears to be a number, but it's just a character that represents the digit 5. -
Andy J almost 10 yearsI ran the same test on my machine (gcc, x86_64) and I got no compile errors, and the program runs fine every time (no garbage). But I didn't do anything different to the OP. Strange.
-
polynomial_donut over 5 yearsAnybody reading this answer should look at R.'s answer below
-
-
Kane almost 11 yearsOther consecutive bytes are not garbage, but the value of
d
, i.e.0x12345678
in your example. -
Eric Postpischil almost 11 yearsThere are rules about casting pointers, a number of which are in clause 6.3.2.3 of the C 2011 standard. Among other things, pointers to objects may be cast to other pointers to objects and, if converted back, will compare equal to the original. Pointers to functions may be cast to other pointers to functions and, if converted back, will compare equal. Converting pointers to functions to pointers to objects results in undefined behavior. Pointers to objects may be converted to pointers to characters and used to access the bytes of an object.
-
A Person over 10 years
d
is not big enough to hold0x12345678
-
DiBosco over 7 yearsGood description. I'd like to point out/discuss that on most computers it may be true that int is a 32-bit value, but for fellow embedded engineers, int is usually 16-bit and it shows how useful and, probably, important it is to use uint16_t, uint32_t, int32_t etc etc.Not trying to be a smart arse, please not to be taking offence. :)
-
Eric about 7 yearsCan you link to an official document with these obscure cases?
-
Kenny Worden about 7 years"...the last two digits will always be 35 (that's the value of the character '5')." Why?
-
aqjune almost 7 yearsConverting pointers to functions to pointers to objects is allowed. "A pointer to a function may be cast to a pointer to an object or to void, allowing a function to be inspected or modified (for example, by a debugger)", J.5.7
-
Evan Benn over 6 yearsI have seen code in a few places that takes a char* and casts it into some other pointer, say int. For example streaming RGB values from a camera, or bytes off the network. Does your reference mean that that code is invalid? Is aligning the data sufficient to make the code correct, or is it just that our common compilers are lenient about this usage?
-
R.. GitHub STOP HELPING ICE over 6 years@EvanBenn: Possibly. If the buffer is obtained by
malloc
, and you store data into it bytewise viafread
or similar, then as long as the offsets are suitably aligned (in general this may be hard to determine, but it's certainly true if they're multiples of the type size) it should be conforming to convert to the appropriate pointer type and access the data as that type. But if you're working with a buffer whose actual type ischar[N]
or something, it's not valid. -
yyny over 6 years@APerson Why is that?
-
pipe over 6 years@aqjune You're quoting a popular extension to C, which is by definition not standard C. It is informative only.
-
Summer Sun almost 6 yearsHi Gilles, when i tried the code here
char *a = "abcd"; int *i = (int *)a; printf("%x\n", *i);
the output is 64636261, but I think it should be 61626364. Does this mean the memory within this int block is read from back to front? -
Gilles 'SO- stop being evil' almost 6 years@SummerSun Why do you think it should be 61626364? If you have a little-endian machine (all PCs are little-endian), it would be 64636261. This has nothing to do with the order in which the memory is read. An
int
is probably read in a single instruction anyway. This is about how a block of 4 bytes is interpreted as anint
value. -
Summer Sun almost 6 yearsThank you @Gilles I thought a data type is ordered according to its memory order, I now understand for little-endian, the value of a data type containing multiple bytes are covered from lower to highter.
-
MartianMartian over 5 yearschar c[] = "5"; char d = c; int *e = (int)d; printf("%p \n", e);
-
Malcolm over 4 yearsThis is a great explanation, but I have an additional question: is casting
char*
toint*
described anywhere? I looked at the standard: no, doesn't seem like it. gcc docs say nothing about this implementation-defined behavior in particular, clang docs say nothing at all. So is this actually undefined behavior that generally works or it is actually somewhat reliable and specified somewhere? -
Gilles 'SO- stop being evil' over 4 years@Malcolm It's undefined behavior. Dereferencing the result of the cast is UB (it might not be properly aligned, for example), and even merely constructing a pointer is usually UB if dereferencing it would be UB (I think the only exceptions are function pointers and pointers to the end of an array). There's one case where the behavior is defined, which is if the pointer was originally an
int*
pointer; any data pointer can be cast tounsigned char*
and back, and I thinkunsigned char *
can be cast tochar *
and back. -
BearAqua over 4 yearsThis is, in fact, UB: wiki.sei.cmu.edu/confluence/display/c/…