How do you know how much space to allocate with malloc()?

13,023

Solution 1

That snippet is allocating enough space for a 2-character name.

Generally the string buffer is going to be filled from somewhere, i.e. I/O. If the size of the string isn't known ahead of time (e.g. reading from file or keyboard), one of three approaches are generally used:

  • Define a maximum size for any given string, allocate that size + 1 (for the null terminator), read at most that many characters, and error or blindly truncate if too many characters were supplied. Not terribly user friendly.

  • Reallocate in stages (preferably using geometric series, e.g. doubling, to avoid quadratic behaviour), and keep on reading until the end has been reached. Not terribly easy to code.

  • Allocate a fixed size and hope it won't be exceeded, and crash (or be owned) horribly when this assumption fails. Easy to code, easy to break. For example, see gets in the standard C library. (Never use this function.)

Solution 2

Well, for a start, sizeof(char) is always 1, so you could just malloc(3).

What you're allocating there is enough space for three characters. But keep in mind you need one for a null terminator for C strings.

What you tend to find is things like:

#define NAME_SZ 30
: : :
char *name = malloc (NAME_SZ+1);

to get enough storage for a name and terminator character (keeping in mind that the string "xyzzy" is stored in memory as:

+---+---+---+---+---+----+
| x | y | z | z | y | \0 |
+---+---+---+---+---+----+

Sometimes with non-char based arrays, you'll see:

int *intArray = malloc (sizeof (int) * 22);

which will allocate enough space for 22 integers.

Solution 3

malloc() will allocate a block of memory and return a pointer to that memory if successful, and NULL if unsuccessful. the size of the block of memory is specified by malloc's argument, in bytes.

the sizeof operator gives the size of its argument in bytes.

char *someString = malloc(sizeof(char) * 50)

this will allocate enough space for a 49 character string (a C-style string must be terminated by a NULL ('\0') character) not including the NULL character, and point someString at that memory.

It looks like that code in your question should be malloc(sizeof(char) * 2);, as sizeof(char) + 2 doesn't make sense.

note that sizeof(char) is guaranteed to always equal 1 (byte) -- but the memory representation of other types (such as long) may vary between compilers.

The way that you get (un)lucky with dynamically allocated memory is if you try to read/write outside of memory you have allocated.

For example,

char *someString = malloc(10);
strcpy(someString, "Hello there, world!");
printf("%s\n", someString);

The first line allocates enough room for 9 characters, and a NULL character.
The second line attempts to copy 20 characters (19 + NULL) into that memory space. This overruns the buffer and might cause something incredibly witty, such as overwriting adjacent memory, or causing a segfault.

The third line might work, for example if there was allocated memory right beside someString, and "Hello there, world!" ran into that memory space, it might print your string plus whatever was in the next memory space. If that second space was NULL terminated, it would then stop--unless it wasn't, in which case it would wander off and eventually segfault.

This example is a pretty simple operation, yet it's so easy to go wrong. C is tricky -- be careful.

Solution 4

Your call to malloc will allocate 3 bytes of memory. sizeof(char) is 1 byte and 2 bytes are indicated explicitly. This gives you enough space for a string of size 2 (along with the termination character)

Solution 5

This will allocate three bytes; 1 for sizeof(char), plus two. Just seeing that line out of context, I have no way of knowing why it would be allocated that way or if it is correct (it looks fishy to me).

You need to allocate enough memory to hold whatever you need to put in it. For example, if you're allocating memory to hold a string, you need to allocate enough memory to hold the longest string expected plus one byte for the terminating null. If you're dealing with ASCII strings, that's easy: one byte per character plus one. If you're using unicode strings, things get more complicated.

Share:
13,023
vishwas kumar
Author by

vishwas kumar

Bad programming is easy. Idiots can learn it in 21 days, even if they are dummies. --Teach Yourself Programming In Ten Years

Updated on June 04, 2022

Comments

  • vishwas kumar
    vishwas kumar almost 2 years

    I'm a total C newbie, I come from C#. I've been learning about memory management and the malloc() function. I've also came across this code:

    char *a_persons_name = malloc(sizeof(char) + 2);
    

    What I don't understand is how much space this is allocating for a_persons_name. Is it allocating 2 characters (eg. AB) or something else?

    I also know that you can sometimes get "lucky" with malloc and use unallocated space (which can result in data corruption and seg faults). So how do I know how much space I'm allocating and how much I will need?

  • vishwas kumar
    vishwas kumar almost 15 years
    Why do all the ways to allocate enough space suck? IS THERE NO EASY WAY!
  • Barry Kelly
    Barry Kelly almost 15 years
    Strings are the most broken part of C. I recommend coding up a pseudo-OO 'StringBuilder' struct or similar, and creating e.g. StrBufPrintf, StrBufGets, StrBufScanf, etc. to centralize these kind of operations. The standard C library doesn't help much. C++ is slightly better, because you usually have 10s of different string classes to choose from, one for each distinct framework being used. Yes, I'm being sarcastic.
  • paxdiablo
    paxdiablo almost 15 years
    The easy way is to either (1) use a language where a string is a basic type; (2) use a library which provides string behavior; or (3) learn the language you're using. If you don't want to learn how to use the tools, why are you even trying. Find another language that's more suited to you (I'm not trying to be insulting here, just pragmatic).
  • vishwas kumar
    vishwas kumar almost 15 years
    @Pax: I really wanted to learn the fundamentals of C so that I could help contribute to GEdit (which I know is written in C and uses GTK). I figured it would be best to learn all I could about memory management before doing anything with a real (large) program. That being said, if I'm going to write a program it's going to be done in C#.
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten almost 15 years
    It's not really that strings are broken in c. It's that strings are surprisingly hard to do right, and c provides little support above the bare metal. Language which provide "easy" strings have a lot going on under the hood (every one of them).
  • paxdiablo
    paxdiablo almost 15 years
    @LucasA, you may find that GTK provides abstraction code for strings as well. GLib contains a GString abstraction (and lists and others) which may make your life easier. I'm not advocating not learning how bare-metal C does strings (you should), just stating that it may not be absolutely necessary for the domain you're interested in.
  • paxdiablo
    paxdiablo almost 15 years
    @BarryK, strings are easy to do in C if you know what you're doing. I have C string processing code that I wrote back in '84 that hasn't been updated since '96 and it gives me everything I need. Yes, many things are hard to do in the base C language but that's one of the reasons you have functions, so you can abstract away the difficulties - you only have to do that once, then amortize the cost over your entire career.
  • paxdiablo
    paxdiablo almost 15 years
    And saying C is broken is the same as saying C++ is broken since it can't do 256-bit integers or both are broken due to the limited precision of floating point - they are what they are. Broken means "doesn't match the spec", not "could be done better" (IMNSHO).
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten almost 15 years
    Strings aren't transparent in c because c isn't a high level language: you can still see the bare metal from c, and nothing is hidden. If you want strings that "just work" you have to give that up. Complaints about the standard library I get, but that's the result of history: that library was developed on machines orders of magnitude less powerful than your cell phone.
  • Barry Kelly
    Barry Kelly almost 15 years
    For sure, strings are easy in C as long as you build the appropriate abstractions yourself. But speaking as a language designer and implementor, I believe it's a fact that C has been almost single-handedly responsible for millions, if not billions, of dollars worth of damage through its particularly weak approach to strings.
  • Barry Kelly
    Barry Kelly almost 15 years
    The power of the machine does not excuse the lack of correctness of the implementation.
  • u0b34a0f6ae
    u0b34a0f6ae over 14 years
    (type and convenience) int *intArray = malloc(sizeof(*intArray) * 22);
  • Anon E. Mous
    Anon E. Mous over 14 years
    "Well, for a start, sizeof(char) is always 1" FALSE. C specifies 1 byte as a LOWER BOUNDS for the size of a char. The actual size is both architecture and compiler dependent. On some more obscure arcitectures a char is 16 bits.
  • paxdiablo
    paxdiablo over 14 years
    No, actually, sizeof(char) is always 1. From c1x, "6.5.3.4 The sizeof operator", para 3: When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.
  • paxdiablo
    paxdiablo over 14 years
    See stackoverflow.com/questions/1535131/… for more detail: the C std defines byte as the addressable unit but it's not necessarily an 8-bit byte (octet).
  • chux - Reinstate Monica
    chux - Reinstate Monica about 8 years
    1) Disagree with "best to use something like this:" p_data = malloc(sizeof(blob) * length_of_array); p_data = malloc(sizeof *p_data * length_of_array); as it does not depend on coding the type right and keeping it right as code changes. 2) Example usage: str2 = malloc(sizeof *str2 * (strlen(str1) + 1));