Why are function pointers and data pointers incompatible in C/C++?

14,017

Solution 1

An architecture doesn't have to store code and data in the same memory. With a Harvard architecture, code and data are stored in completely different memory. Most architectures are Von Neumann architectures with code and data in the same memory but C doesn't limit itself to only certain types of architectures if at all possible.

Solution 2

Some computers have (had) separate address spaces for code and data. On such hardware it just doesn't work.

The language is designed not only for current desktop applications, but to allow it to be implemented on a large set of hardware.


It seems like the C language committee never intended void* to be a pointer to function, they just wanted a generic pointer to objects.

The C99 Rationale says:

6.3.2.3 Pointers
C has now been implemented on a wide range of architectures. While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.

The use of void* (“pointer to void”) as a generic object pointer type is an invention of the C89 Committee. Adoption of this type was stimulated by the desire to specify function prototype arguments that either quietly convert arbitrary pointers (as in fread) or complain if the argument type does not exactly match (as in strcmp). Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.

Note Nothing is said about pointers to functions in the last paragraph. They might be different from other pointers, and the committee is aware of that.

Solution 3

For those who remember MS-DOS, Windows 3.1 and older the answer is quite easy. All of these used to support several different memory models, with varying combinations of characteristics for code and data pointers.

So for instance for the Compact model (small code, large data):

sizeof(void *) > sizeof(void(*)())

and conversely in the Medium model (large code, small data):

sizeof(void *) < sizeof(void(*)())

In this case you didn't have separate storage for code and date but still couldn't convert between the two pointers (short of using non-standard __near and __far modifiers).

Additionally there's no guarantee that even if the pointers are the same size, that they point to the same thing - in the DOS Small memory model, both code and data used near pointers, but they pointed to different segments. So converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion.

Solution 4

Pointers to void are supposed to be able to accommodate a pointer to any kind of data -- but not necessarily a pointer to a function. Some systems have different requirements for pointers to functions than pointers to data (e.g, there are DSPs with different addressing for data vs. code, medium model on MS-DOS used 32-bit pointers for code but only 16-bit pointers for data).

Solution 5

In addition to what is already said here, it is interesting to look at POSIX dlsym():

The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void * can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void * can hold a pointer to a function. The result of converting a pointer to a function into a pointer to another data type (except void *) is still undefined, however. Note that compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted as in:

 fptr = (int (*)(int))dlsym(handle, "my_function");

Due to the problem noted here, a future version may either add a new function to return function pointers, or the current interface may be deprecated in favor of two new functions: one that returns data pointers and the other that returns function pointers.

Share:
14,017
gexicide
Author by

gexicide

I'm a long term developer of Tableau's Hyper database engine and currently the manager in charge of the Hyper API.

Updated on June 25, 2022

Comments

  • gexicide
    gexicide almost 2 years

    I have read that converting a function pointer to a data pointer and vice versa works on most platforms but is not guaranteed to work. Why is this the case? Shouldn't both be simply addresses into main memory and therefore be compatible?

  • Manav
    Manav over 13 years
    'physically distinct' I understand, but can you elaborate more on the 'fundamentally incompatible' distinction. As I said in the question, isn't a void pointer supposed to as large as any pointer type - or is that a wrong presumption on my part.
  • Manav
    Manav over 13 years
    but then should'nt the dlsym () function be returning something other than a void *. I mean, if the void * is not big enough for the function pointer, arn't we already fubared?
  • Jerry Coffin
    Jerry Coffin over 13 years
    @Knickerkicker: Yes, probably. If memory serves, the return type from dlsym was discussed at length, probably 9 or 10 years ago, on the OpenGroup's email list. Offhand, I don't remember what (if anything) came of it though.
  • ephemient
    ephemient over 13 years
    @KnickerKicker: void * is large enough to hold any data pointer, but not necessarily any function pointer.
  • Manav
    Manav over 13 years
    you're right. This seems a fairly nice (although outdated) summary of your point.
  • Manav
    Manav over 13 years
    Nice! While I agree this does seem more maintainable, it is still not obvious (to me) how I hammer on static linking on top of this. Can you elaborate?
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    If each module has its own foo_module structure (with unique names), you can simply create an extra file with an array of struct { const char *module_name; const struct module *module_funcs; } and a simple function to search this table for the module you want to "load" and return the right pointer, then use this in place of dlopen and dlsym.
  • caf
    caf over 13 years
    @KnickerKicker: Yes, ideally there would be a separate dlsym_function() for returning function symbols, or dlsym() would return a union {} that contains both a void * and a void (*)().
  • Edward Strange
    Edward Strange over 11 years
    The standard could make them compatible without messing with this by simply making the data types the same size and guaranteeing that assigning to one and then back will result in the same value. They do this with void*, which is the only pointer type compatible with everything.
  • ouah
    ouah over 11 years
    @CrazyEddie You cannot assign a function pointer to a void *.
  • gexicide
    gexicide over 11 years
    @Crazy Eddie: So, void* is compatible with function pointers? I thought ALL non-function pointers were incompatible with function pointers, including void*
  • Richard Chambers
    Richard Chambers over 11 years
    This does not really answer the question. It instead just says that it is not guaranteed to work which is something we already know. What about separate address spaces makes a difference? For instance with the old 8086 chip there were two kinds of pointers, near and far due to the segmented memory addressing used and one had to choose whether to how data and functions were accessed by selecting a memory model as the compiler target. So you could have incompatible function and data pointers.
  • Edward Strange
    Edward Strange over 11 years
    I could be wrong on void* accepting function pointers, but the point remains. Bits are bits. The standard could require that the size of the different types be able to accomodate the data from each other and the assignment would be guaranteed to work even if they are used in different memory segments. The reason this incompatibility exists is that this is NOT guaranteed by the standard and so data can be lost in the assignment.
  • dmp
    dmp over 11 years
    @CrazyEddie: Aside of all the platform specific issues, it discourages bad programming style.
  • matth
    matth over 11 years
    But requiring sizeof(void*) == sizeof( void(*)() ) would waste space in the case where function pointers and data pointers are different sizes. This was a common case in the 80's, when the first C standard was written.
  • ouah
    ouah over 11 years
    @CrazyEddie it is not only not guaranteed but also not allowed by the C Standard. C says you can only convert function pointers to function pointers, not to object or void pointers.
  • Michael Burr
    Michael Burr over 11 years
    @CrazyEddie: the standard could have required that function pointers be convertible to void* and back, but doesn't. Presumably since there is pretty much no support in C for treating functions as objects, there was no reason for the committee to add potential overhead to object pointers in order to let them hold function pointers portably.
  • gexicide
    gexicide over 11 years
    does that mean that using dlsym to get the address of a function is currently unsafe? Is there currently a safe way to do it?
  • Edward Strange
    Edward Strange over 11 years
    Exactly Rob, hence my answer :P
  • Maxim Egorushkin
    Maxim Egorushkin over 11 years
    It means that currently POSIX requires from a platform ABI that both function and data pointers can be safely cast to void* and back .
  • John Bode
    John Bode over 11 years
    @RichardChambers: The different address spaces may also have different address widths, such as an Atmel AVR that uses 16 bits for instructions and 8 bits for data; in that case, it would be hard converting from data (8 bit) to function (16 bit) pointers and back again. C's supposed to be easy to implement; part of that ease comes from leaving data and instruction pointers incompatible with each other.
  • Michael Graczyk
    Michael Graczyk over 11 years
    Also, even if code and data are stored in the same place in physical hardware, software and memory access often prevent running data as code without operating system "approval". DEP and the like.
  • Michael Burr
    Michael Burr over 11 years
    At least as important as having different address spaces (maybe more important) is that function pointers may have a different representation than data pointers.
  • caf
    caf over 11 years
    You don't even have to have a Harvard architecture to have code and data pointers using different address spaces - the old DOS "Small" memory model did this (near pointers with CS != DS).
  • ruakh
    ruakh over 11 years
    Re: "converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion": This doesn't entirely follow. Converting an int* to a void* give you a pointer that you can't really do anything with, but it's still useful to be able to perform the conversion. (This is because void* can store any object pointer, so can be used for generic algorithms that don't need to know what type they hold. The same thing could be useful for function pointers as well, if it were allowed.)
  • caf
    caf over 11 years
    @ruakh: In the case of converting the int * to void *, the void * is guaranteed to at least point to the same object as the original int * did - so this is useful for generic algorithms that access the pointed-to object, like int n; memcpy(&n, src, sizeof n);. In the case where converting a function pointer to a void * doesn't yield a pointer pointing at the function, it isn't useful for such algorithms - the only thing you could do is convert the void * back to a function pointer again, so you might as well just use a union containing a void * and function pointer.
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    One can, but one shouldn’t. A conforming compiler must generate a warning for that (which in turn should trigger an error, cf. -Werror). A better (and non-UB) solution is to retrieve a pointer to the object returned by dlsym (i.e. void**) and convert that to a pointer to function pointer. Still implementation-defined but no longer cause for a warning/error.
  • MSalters
    MSalters over 11 years
    @KonradRudolph: Disagree. The "conditionally-supported" wording was specifically written to allow dlsym and GetProcAddress to compile without warning.
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    @MSalters What do you mean, “disagree”? Either I’m right or wrong. The dlsym documentation explicitly says that “compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted”. This doesn’t leave much room for speculation. And GCC (with -pedantic) does warn. Again, no speculation possible.
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    Follow-up: I think now I understand. It’s not UB. It’s implementation-defined. I’m still unsure whether the warning must be generated or not – probably not. Oh well.
  • MSalters
    MSalters over 11 years
    @KonradRudolph: I disagreed with your "shouldn't", which is an opinion. The answer specifically mentioned C++11, and I was a member of the C++ CWG at the time the issue was addressed. C99 indeed has different wording, conditionally-supported is a C++ invention.
  • David Hammen
    David Hammen over 11 years
    @gexicide It means that implementations that are POSIX compliant have made an extension to the language, giving an implementation-defined meaning to what is undefined behavior per the standard intself. It's even listed as one of the common extensions to the C99 standard, section J.5.7 Function pointer casts.
  • Maxim Egorushkin
    Maxim Egorushkin over 11 years
    @DavidHammen It is not an extension to the language, rather a new extra requirement. C doesn't require void* to be compatible with a function pointer, whereas POSIX does.
  • PypeBros
    PypeBros over 11 years
    even modern processors would struggle with such mixture as the instruction and data cache are typically handled separately, even when the operating system allows you to write code somewhere.
  • ruakh
    ruakh over 11 years
    @caf: Fair enough. Thanks for pointing that out. And for that matter, even if the void* did point to the function, I suppose it would be a bad idea for people to pass it to memcpy. :-P
  • Jonathan Leffler
    Jonathan Leffler over 11 years
    Copied from above: Note what POSIX says in Data Types: §2.12.3 Pointer Types. All function pointer types shall have the same representation as the type pointer to void. Conversion of a function pointer to void * shall not alter the representation. A void * value resulting from such a conversion can be converted back to the original function pointer type, using an explicit cast, without loss of information. Note: The ISO C standard does not require this, but it is required for POSIX conformance.
  • Eric J.
    Eric J. over 11 years
    Windows 8 specifically requires processors that provide code execution protection. It will not install without that CPU feature. That makes it impossible for user mode data to execute as code.
  • Dietrich Epp
    Dietrich Epp over 11 years
    @EricJ. Until you call VirtualProtect, which allows you to mark regions of data as executable.
  • Eric J.
    Eric J. over 11 years
    @DietrichEpp: Apparently not quite that simple, but you're right, bypassing is apparently not that hard. Thanks for mentioning that. vulnfactory.org/blog/2011/09/21/…
  • Admin
    Admin almost 10 years
  • SSpoke
    SSpoke almost 10 years
    back to the future :P
  • Jerry Coffin
    Jerry Coffin almost 10 years
    @LegoStormtroopr: Interesting how 21 people agree with the idea of up-voting, but only about 3 have actually done so. :-)
  • user877329
    user877329 over 9 years
    @R.. True, but it adds maintenance cost by having to maintain the module structure.
  • skyking
    skyking about 8 years
    @caf Not only the small memory model did have this, you also have the medium and compact models where one pointer is near and the other is far. On small model you're at least guaranteed to be able to cast it back to it's original type without loss of data. Then you have huge model where data pointers are huge but code pointers are only far - whether the compiler converts between these correctly I don't know (but if not done correctly you would end up pointing somewhere else).
  • Manuel Jacob
    Manuel Jacob almost 7 years
    This answer is wrong. You can for example convert a function pointer to a data pointer and read from it (if you have permissions to read from that address, as usual). The result makes as much sense as it does e.g. on x86.
  • Deduplicator
    Deduplicator almost 6 years
    @JohnBode That round-trip is trivial, assuming no pointer-safety. But starting from the bigger type runs into trouble with compressibility.
  • Deduplicator
    Deduplicator almost 6 years
    @caf If it just should be passed through to some callback which knows the proper type, I'm only interested in round-trip safety, not any other relationship those converted values might possibly have.
  • pmor
    pmor over 2 years
    What is the "idea of up-voting"?
  • Jerry Coffin
    Jerry Coffin over 2 years
    @pmor: The point was that many people had up-voted the comment saying "+1 for answering the question before it was asked", but only a few had actually up-voted the answer itself.