How do free and malloc work in C?

78,932

Solution 1

When you malloc a block, it actually allocates a bit more memory than you asked for. This extra memory is used to store information such as the size of the allocated block, and a link to the next free/used block in a chain of blocks, and sometimes some "guard data" that helps the system to detect if you write past the end of your allocated block. Also, most allocators will round up the total size and/or the start of your part of the memory to a multiple of bytes (e.g. on a 64-bit system it may align the data to a multiple of 64 bits (8 bytes) as accessing data from non-aligned addresses can be more difficult and inefficient for the processor/bus), so you may also end up with some "padding" (unused bytes).

When you free your pointer, it uses that address to find the special information it added to the beginning (usually) of your allocated block. If you pass in a different address, it will access memory that contains garbage, and hence its behaviour is undefined (but most frequently will result in a crash)

Later, if you free() the block but don't "forget" your pointer, you may accidentally try to access data through that pointer in the future, and the behaviour is undefined. Any of the following situations might occur:

  • the memory might be put in a list of free blocks, so when you access it, it still happens to contain the data you left there, and your code runs normally.
  • the memory allocator may have given (part of) the memory to another part of your program, and that will presumably have then overwritten (some of) your old data, so when you read it, you'll get garbage which might cause unexpected behaviour or crashes from your code. Or you will write over the other data, causing the other part of your program to behave strangely at some point in the future.
  • the memory could have been returned to the operating system (a "page" of memory that you're no longer using can be removed from your address space, so there is no longer any memory available at that address - essentially an unused "hole" in your application's memory). When your application tries to access the data a hard memory fault will occur and kill your process.

This is why it is important to make sure you don't use a pointer after freeing the memory it points at - the best practice for this is to set the pointer to NULL after freeing the memory, because you can easily test for NULL, and attempting to access memory via a NULL pointer will cause a bad but consistent behaviour, which is much easier to debug.

Solution 2

You probably know that you are supposed to pass back exactly the pointer you received.

Because free() does not at first know how big your block is, it needs auxiliary information in order to identify the original block from its address and then return it to a free list. It will also try to merge small freed blocks with neighbors in order to produce a more valuable large free block.

Ultimately, the allocator must have metadata about your block, at a minimum it will need to have stored the length somewhere.

I will describe three ways to do this.

  • One obvious place would be to store it just before the returned pointer. It could allocate a block that is a few bytes larger than requested, store the size in the first word, then return to you a pointer to the second word.

  • Another way would be to keep a separate map describing at least the length of allocated blocks, using the address as a key.

  • An implementation could derive some information from the address and some from a map. The 4.3BSD kernel allocator (called, I think, the "McKusick-Karel allocator") makes power-of-two allocations for objects of less than page size and keeps only a per-page size, making all allocations from a given page of a single size.

It would be possible with some types of the second and probably any kind of the third type of allocator to actually detect that you have advanced the pointer and DTRT, although I doubt if any implementation would burn the runtime to do so.

Solution 3

Most (if not all) implementation will lookup the amount of data to free a few bytes before the actual pointer you are manipulating. Doing a wild free will lead to memory map corruption.

If your example, when you allocate 10 bytes of memory, the system actually reserve, let's say, 14. The first 4 contains the amount of data you requested (10) and then the return value of the malloc is a pointer to the first byte of unused data in the 14 allocated.

When you call free on this pointer, the system will lookup 4 bytes backwards to know that it originally allocated 14 bytes so that it knows how much to free. This system prevents you from providing the amount of data to free as an extra parameter to free itself.

Of course, other implementation of malloc/free can choose other way to achieve this. But they generally don't support to free on a different pointer than what was returned by malloc or equivalent function.

Solution 4

From http://opengroup.org/onlinepubs/007908775/xsh/free.html

The free() function causes the space pointed to by ptr to be deallocated; that is, made available for further allocation. If ptr is a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by the calloc(), malloc(), realloc() or valloc() function, or if the space is deallocated by a call to free() or realloc(), the behaviour is undefined. Any use of a pointer that refers to freed space causes undefined behaviour.

Solution 5

That's undefined behaviour - don't do it. Only free() pointers obtained from malloc(), never adjust them prior to that.

The problem is free() must be very fast, so it doesn't try to find the allocation your adjusted address belongs to, but instead tries to return the block at exactly the adjusted address to the heap. That leads to undefined behaviour - usually heap corruption or crashing the program.

Share:
78,932
Admin
Author by

Admin

Updated on July 15, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm trying to figure out what would happened if I try to free a pointer "from the middle" for example, look at the following code:

    char *ptr = (char*)malloc(10*sizeof(char));
    
    for (char i=0 ; i<10 ; ++i)
    {
        ptr[i] = i+10;
    }
    ++ptr;
    ++ptr;
    ++ptr;
    ++ptr;
    free(ptr);
    

    I get a crash with an Unhandled exception error msg. I want to understand why and how free works so that I know not only how to use it but also be able to understand weird errors and exceptions and better debug my codeץ

    Thanks a lot

  • GManNickG
    GManNickG over 14 years
    A link with no explanation isn't really an answer.
  • PetrosB
    PetrosB over 14 years
    Why!? I've seen many times just a link being the accepted answer!
  • paxdiablo
    paxdiablo over 14 years
    The problems with links, @Petros, and others may disagree with me (good chance seeing that there's 120,000-odd of us), is that they may disappear (yes, even things like Wikipedia). I don't mind links themselves but there should be enough meat in the answer so that, even if the rest of the internet was destroyed, SO could still be useful. What I tend to do is explain enough to answer the question then put in any links for those that want to go further.
  • PetrosB
    PetrosB over 14 years
    Realistically speaking, I don't think that Open Group's site will go anywhere. Also, the answer was edited and a self-explanatory quoted text which could be the answer to the OP's question was added.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    I would not classify this as just an issue of being fast. Without extensive bookkeeping information that could also cost a lot in terms of memory or impose a particular[ly bad] design, finding the start of an allocated block given a random pointer inside it is simply not possible.
  • onmyway133
    onmyway133 over 11 years
    Supposed I have char s[3] = {a,b,c}. Why s == 'a' ??
  • Zeograd
    Zeograd over 11 years
    in this particular case, there isn't any dynamic allocation involved. The compiler is allocating the 3 needed bytes on the stack and not on the heap. You don't have to (and shouldn't !) call free(s)
  • onmyway133
    onmyway133 over 11 years
    you say "the return value of the malloc is a pointer to the first byte of unused data in the 14 allocated", but then you say "lookup 4 bytes backward" !!?? And, is it documented somewhere ?
  • Zeograd
    Zeograd over 11 years
    This information depends on the malloc implementation you use and the documentation is generally only found as comment in the source code. For instance, in the GNU libc implementation, you can find this comment : Minimum overhead per allocated chunk: 4 or 8 bytes Each malloced chunk has a hidden word of overhead holding size and status information.
  • Koray Tugay
    Koray Tugay about 9 years
    @R.. 'inding the start of an allocated block given a random pointer inside it is simply not possible.' I do not think so..
  • Eugene Shatsky
    Eugene Shatsky over 5 years
    @onmyway133, also, s is a pointer to the first array element, it can be equal to 'a' character only by accident.