Array of size 0 at the end of struct

15,846

Solution 1

Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.

Quoting the standard,

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. [...]

The syntax should be

struct s { int n; double d[]; };

where the last element is incomplete type, (no array dimensions, not even 0).

So, your code should better look like

struct array{
    size_t size;
    int data[ ];
};

to be standard-conforming.

Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.

Solution 2

Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).

Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:

struct array{
    size_t size;
    int data[];
};

Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:

malloc(sizeof(array) + sizeof(int[size]));

(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.

Solution 3

Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.

from my understanding, structs do not have their elements necessarily in continuous locations.

Yes. struct data type do not have their elements necessarily in continuous locations.

Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say

array * a = array_new(3);
a->data[1] = 12;

?

You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"

No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is: §6.7.2.1/15

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.


I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?

Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):

After the declaration:

struct s { int n; double d[]; };

the structure struct s has a flexible array member d. A typical way to use this is:

int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:

struct { int n; double d[m]; } *p;

(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

Solution 4

A more standard way would be to define your array with a data size of 1, as in:

struct array{
    size_t size;
    int data[1]; // <--- will work across compilers
};

Then use the offset of the data member (not the size of the array) in the calculation:

array *array_new(size_t size){
    array* a = malloc(offsetof(array, data) + size * sizeof(int));

    if(a){
        a->size = size;
    }

    return a;
}

This is effectively using array.data as a marker for where the extra data might go (depending on size).

Solution 5

The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:

header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);

As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

Share:
15,846
nbro
Author by

nbro

don't believe the hype

Updated on June 07, 2022

Comments

  • nbro
    nbro almost 2 years

    My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:

    struct array{
        size_t size;
        int data[0];
    };
    
    typedef struct array array;
    

    This is a useful struct to define or initialize an array with a variable, i.e., something as follows:

    array *array_new(size_t size){
        array* a = malloc(sizeof(array) + size * sizeof(int));
    
        if(a){
            a->size = size;
        }
    
        return a;
    }
    

    That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.

    Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say

    array * a = array_new(3);
    a->data[1] = 12;
    

    ?

    From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.

    I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?