What is the address of a function in a C++ program?

21,914

Solution 1

So, here the address of function and the address of first variable in function is not same. Why so?

Why it would be so? A function pointer is a pointer that points to the function. It does not point to the first variable inside the function, anyway.

To elaborate, a function (or subroutine) is a collection of instructions (including variable definition and different statements/ operations) that performs a specific job, mostly multiple times, as required. It is not just a pointer to the elements present inside the function.

The variables, defined inside the function are not stored in the same memory area as that of the executable machine code. Based on the storage type, the variables which are present inside the function are located in some other part of the memory of the executing program.

When a program is built (compiled into an object file), different part of the program gets organized in a different manner.

  • Usually, the function (executable code), resides in a separate segment called code segment, usually a read-only memory location.

  • The compile time allocated variable, OTOH, are stored into the data segment.

  • The function local variables, usually are populated onto the stack memory, as and when needed.

So, there is no such relation that a function pointer will yield the address of the first variable present in the function, as seen in the source code.

In this regard, to quote the wiki article,

Instead of referring to data values, a function pointer points to executable code within memory.

So, TL;DR, the address of a function is a memory location inside the code (text) segment where the executable instructions reside.

Solution 2

A function's address is only a symbolic way to hand this function around, like pass it in a call or such. Potentially, the value you get for the address of a function is not even a pointer to memory.

Functions' addresses are good for exactly two things:

  1. to compare for equality p==q, and

  2. to dereference and call (*p)()

Anything else you try to do is undefined, might or might not work, and is compiler's decision.

Solution 3

Alright, this is going to be fun. We get to go from the extremely abstract concept of what a function pointer is in C++ all the way down to the assembly code level, and thanks to some of the particular confusions we're having, we even get to discuss stacks!

Let's start at the highly abstract side, because that's clearly the side of things you're starting from. you have a function char** fun() that you're playing with. Now, at this level of abstraction, we can look at what operations are permitted on function pointers:

  • We can test if two function pointers are equal. Two function pointers are equal if they point at the same function.
  • We can do inequality testing on those pointers, allowing us to do sorting of such pointers.
  • We can deference a function pointer, which results in a "function" type which is really confusing to work with, and I will choose to ignore it for now.
  • We can "call" a function pointer, using the notation you used: fun_ptr(). The meaning of this is identical to calling whatever function is being pointed at.

That's all they do at the abstract level. Underneath that, compilers are free to implement it however they see fit. If a compiler wanted to have a FunctionPtrType which is actually an index into some big table of every function in the program, they could.

However, this is typically not how it is implemented. When compiling C++ down to assembly/machine code, we tend to take advantage of as many architecture-specific tricks as possible, to save runtime. On real life computers, there is almost always an "indirect jump" operation, which reads a variable (usually a register), and jumps over to begin executing the code that's stored at that memory address. Its almost univeral that functions are compiled into contiguous blocks of instructions, so if you ever jump to the first instruction in the block, it has the logical effect of calling that function. The address of the first instruction happens to satisfy every one of the comparisons required by C++'s abstract concept of a function pointer and it happens to be exactly the value the hardware needs to use an indirect jump to call the function! That's so convenient, that virtually every compiler chooses to implement it that way!

However, when we start talking about why the pointer you thought you were looking at was the same as the function pointer, we have to get into something a bit more nuanced: segments.

Static variables are stored separate from the code. There's a few reasons for that. One is that you want your code as tight as possible. You don't want your code speckled with the memory spaces to store variables. It'd be inefficient. You'd have to skip over all sorts of stuff, rather than just getting to plow through it. There's also a more modern reason: most computers allow you to mark some memory as "executable" and some "writable." Doing this helps tremendously for dealing with some really evil hacker tricks. We try to never mark something as both executable and writable at the same time, in case a hacker cleverly finds a way to trick our program into overwriting some of our functions with their own!

Accordingly, there is typically a .code segment (using that dotted notation simply because its a popular way to notate it in many architectures). In this segment, you find all of the code. The static data will go in somewhere like .bss. So you may find your static string stored quite far away from the code that operates on it (typically at least 4kb away, because most modern hardware allows you to set execute or write permissions at the page level: pages are 4kb in lots of modern systems)

Now the last piece... the stack. You mentioned storing things on the stack in a confusing way, which suggests it may be helpful to give it a quick going over. Let me make a quick recursive function, because they are more effective at demonstrating what is going on in the stack.

int fib(int x) {
    if (x == 0)
        return 0;

    if (x == 1)
        return 1;

    return fib(x-1)+fib(x-2);
}

This function calculates the Fibonacci sequence using a rather inefficient but clear way of doing it.

We have one function, fib. This means &fib is always a pointer to the same place, but we're clearly calling fib many times, so each one needs its own space right?

On the stack we have what are called "frames." Frames are not the functions themselves, but rather they are sections of memory which this particular invocation of the function is allowed to use. Every time you call a function, like fib, you allocate a little more space on the stack for its frame (or, more pedantically, it will allocate it after you make the call).

In our case, fib(x) clearly needs to store the result of fib(x-1) while executing fib(x-2). It can't store this in the function itself, or even in the .bss segment because we don't know how many times it is going to get recursed. Instead, it allocates space on the stack to store its own copy of the result of fib(x-1) while fib(x-2) is operating in its own frame (using the exact same function, and the same function address). When fib(x-2) returns, fib(x) simply loads up that old value, which it is certain has not been touched by anyone else, adds the results, and returns it!

How does it do this? Virtually every processor out there has some support for a stack in hardware. On x86, this is known as the ESP register (extended-stack pointer). Programs generally agree to treat this as a pointer to the next spot in the stack where you can start storing data. You're welcome to move this pointer around to build yourself space for a frame, and move in. When you finish executing, you are expected to move everything back.

In fact, on most platforms, the first instruction in your function is not the first instruction in the final compiled version. Compilers inject a few extra ops to manage this stack pointer for you, so that you never even have to worry about it. On some platforms, like x86_64, this behavior is often even mandatory and specified in the ABI!

So in all we have:

  • .code segment - where your function's instructions are stored. The function pointer will point to the first instruction in here. This segment is typically marked "execute/read only," preventing your program from writing to it after it has been loaded.
  • .bss segment - where your static data will get stored, because it can't be part of the "execute only" .code segment if it wants to be data.
  • the stack - where your functions can store frames, which keep track of the data needed just for that one instantation, and nothing more. (Most platforms also use this to store the information about where to return to after a function finishes)
  • the heap - This didn't appear in this answer, because your question doesn't include any heap activities. However, for completeness, I've left it here so that it doesn't surprise you later.

Solution 4

In the text of your question you say:

And thus we can say that the address of function and the address of the first instruction in the function will be the same (In this case the first instruction is the initialization of a variable.).

but in the code you don't get the address of the first instruction in the function but the address of some local variable declared in the function.

A function is code, a variable is data. They are stored in different memory areas; they don't even reside in the same memory block. Due to security restrictions imposed by the OSes nowadays, the code is stored in memory blocks that are marked as read-only.

As far as I know, the C language doesn't provide any way to get the address of a statement in the memory. Even if it would provide such a mechanism, the start of the function (the function's address in memory) is not the same as the address of the machine code generated from the first C statement.

Before the code generated from the first C statement, the compiler generates a function prolog that (at least) saves the current value of the stack pointer and makes room for function's local variables. This means several assembly instructions before any code generated from the first statement of the C function.

Solution 5

What exactly is the address of a function in a C++ program?

Like other variables, an address of a function is the space allocated for it. In other words, it is the memory location where the instructions (machine code) for the operation performed by the function is stored.

To understand this, take a deep look on memory layout of a program.

A program's variables and executable code/instructions are stored at different segments of memory (RAM). Variables goes to any of STACK, HEAP, DATA and BSS segment while executable code goes to the CODE segment. Look at the general memory layout of a program

enter image description here

Now you can see that there are different memory segments for variables and instructions. They are stored at different memory locations. The function address is the address which is located at the CODE segment.

So, you are confusing the term first statement with first executable instruction. When function call is invoked, program counter is updated with the address of the function. Therefore, the function pointer points to the first instruction of the function stored in memory.

enter image description here

Share:
21,914
Amit Upadhyay
Author by

Amit Upadhyay

I welcome you to this page and I have got a message for you: "The only way out is through." I love to play with technologies & try to learn new things every day. I'm good with programming languages & their compilers (Go, Java, Python, C, C++, JavaScript, C#). I am also good with Data Structure & Algorithms and use them to solve problems optimally. Got decent experience in software developments too. I mostly focus on solving problems with the best possible technology present, optimizing modules involved in the application, writing a good design. I love building things, making them scalable. I have made some software products which are present in the market. To know more about me and my projects please visit http://binomial.me I've worked heavily with Android during my school time, built few apps which are available in Playstore also worked as a freelancer, developed an android application for a company. IDEs I prefer using: vim & Intellij. I contribute back to the programmers' community here on Binomial.me and of course on StackOverflow Feel free to contact me at [email protected], [email protected]

Updated on February 21, 2020

Comments

  • Amit Upadhyay
    Amit Upadhyay about 4 years

    As the function is set of instruction stored in one contiguous block of memory.

    And address of a function (entry point) is the address of the first instruction in the function. (from my knowledge)

    And thus we can say that the address of function and the address of the first instruction in the function will be the same (In this case the first instruction is the initialization of a variable.).

    But the program below contradicts the above line.

    code:

    #include<iostream>
    #include<stdio.h>
    #include<string.h>
    using namespace std;
    char ** fun()
    {
        static char * z = (char*)"Merry Christmas :)";
        return &z;
    }
    int main()
    {
        char ** ptr = NULL;
    
        char ** (*fun_ptr)(); //declaration of pointer to the function
        fun_ptr = &fun;
    
        ptr = fun();
    
        printf("\n %s \n Address of function = [%p]", *ptr, fun_ptr);
        printf("\n Address of first variable created in fun() = [%p]", (void*)ptr);
        cout<<endl;
        return 0;
    }
    

    One output example is:

     Merry Christmas :) 
     Address of function = [0x400816]
     Address of first variable created in fun() = [0x600e10]
    

    So, here the address of function and the address of first variable in function is not same. Why so?

    I searched on google but can't come up with the exact required answer and being new to this language I exactly can't catch some of contents on net.

  • Amit Upadhyay
    Amit Upadhyay over 8 years
    okay, I read somewhere in a book that a function doesn't have is own address but it's address is just the address of the beginning of the stack in the memory and as far as I know that the beginning of the stack starts with the initialization of variable (or instructions ).
  • Sourav Ghosh
    Sourav Ghosh over 8 years
    @AmitUpadhyay I really don't understand what you're talking about. What stack? What beginning?
  • LogicStuff
    LogicStuff over 8 years
    @Amit Consider a recursion. You need the same instructions, but also multiple stack frames to hold the local data.
  • BlueRaja - Danny Pflughoeft
    BlueRaja - Danny Pflughoeft over 8 years
    @AmitUpadhyay: I think you've got that backwards. On most architectures, the first value of the stack-frame is the return address for the current call (which I guess you could consider a function pointer, though it technically has no high-level equivalent in C++), not the other way around.
  • Leushenko
    Leushenko over 8 years
    In particular, you could easily have a conforming C++ implementation that's designed as an interpreter written in an object-oriented language like e.g. Java. Function pointers in this implmentation might be something like integer indexes into an array of opaque bytecode objects, and have no connection to machine code (or even interpreter code) at all.
  • hyde
    hyde over 8 years
    "The variables, defined inside the function are not necessarily stored in the same memory address as that of the executable part.", umm, just what you mean by "executable code is not necessarily in the same memory address as variables"?
  • hyde
    hyde over 8 years
    This seems to be the only answer so far, which directly addresses the point OP seems to be confused about: code and data are two different things...
  • Sourav Ghosh
    Sourav Ghosh over 8 years
    @hyde It is regarding to the data and code segment part. I tried to clarify but if you feel it can be improved by using any alternative wordings, please feel free to edit. Thanks. :)
  • hyde
    hyde over 8 years
    @SouravGhosh Edited. Roll back or edit further if you disagree.
  • Steve Jessop
    Steve Jessop over 8 years
    Furthermore, it's not even necessarily true that a function is (as the questioner puts it) a "set of instructions stored in one contiguous block of memory". Once the optimiser is finished functions need not be contiguous (for example common code could be lifted out of two functions and shared, albeit that's rare), and they may consist of other things as well as instructions (for example constant pools or debug info).
  • glglgl
    glglgl over 8 years
    If the function is called multiple times during the execution of the program, the variable can be at different addresses each time. Not if the variable is static.
  • Sourav Ghosh
    Sourav Ghosh over 8 years
    Dear Downvoter, your silence will not help to improve the quality of the post, anyway. If you have some suggestion, please let me know so that the post can be improved. Thanks. :)