When would anyone use a union? Is it a remnant from the C-only days?

83,367

Solution 1

Unions are usually used with the company of a discriminator: a variable indicating which of the fields of the union is valid. For example, let's say you want to create your own Variant type:

struct my_variant_t {
    int type;
    union {
        char char_value;
        short short_value;
        int int_value;
        long long_value;
        float float_value;
        double double_value;
        void* ptr_value;
    };
};

Then you would use it such as:

/* construct a new float variant instance */
void init_float(struct my_variant_t* v, float initial_value) {
    v->type = VAR_FLOAT;
    v->float_value = initial_value;
}

/* Increments the value of the variant by the given int */
void inc_variant_by_int(struct my_variant_t* v, int n) {
    switch (v->type) {
    case VAR_FLOAT:
        v->float_value += n;
        break;

    case VAR_INT:
        v->int_value += n;
        break;
    ...
    }
}

This is actually a pretty common idiom, specially on Visual Basic internals.

For a real example see SDL's SDL_Event union. (actual source code here). There is a type field at the top of the union, and the same field is repeated on every SDL_*Event struct. Then, to handle the correct event you need to check the value of the type field.

The benefits are simple: there is one single data type to handle all event types without using unnecessary memory.

Solution 2

I find C++ unions pretty cool. It seems that people usually only think of the use case where one wants to change the value of a union instance "in place" (which, it seems, serves only to save memory or perform doubtful conversions).

In fact, unions can be of great power as a software engineering tool, even when you never change the value of any union instance.

Use case 1: the chameleon

With unions, you can regroup a number of arbitrary classes under one denomination, which isn't without similarities with the case of a base class and its derived classes. What changes, however, is what you can and can't do with a given union instance:

struct Batman;
struct BaseballBat;

union Bat
{
    Batman brucewayne;
    BaseballBat club;
};

ReturnType1 f(void)
{
    BaseballBat bb = {/* */};
    Bat b;
    b.club = bb;
    // do something with b.club
}

ReturnType2 g(Bat& b)
{
    // do something with b, but how do we know what's inside?
}

Bat returnsBat(void);
ReturnType3 h(void)
{
    Bat b = returnsBat();
    // do something with b, but how do we know what's inside?
}

It appears that the programmer has to be certain of the type of the content of a given union instance when he wants to use it. It is the case in function f above. However, if a function were to receive a union instance as a passed argument, as is the case with g above, then it wouldn't know what to do with it. The same applies to functions returning a union instance, see h: how does the caller know what's inside?

If a union instance never gets passed as an argument or as a return value, then it's bound to have a very monotonous life, with spikes of excitement when the programmer chooses to change its content:

Batman bm = {/* */};
Baseball bb = {/* */};
Bat b;
b.brucewayne = bm;
// stuff
b.club = bb;

And that's the most (un)popular use case of unions. Another use case is when a union instance comes along with something that tells you its type.

Use case 2: "Nice to meet you, I'm object, from Class"

Suppose a programmer elected to always pair up a union instance with a type descriptor (I'll leave it to the reader's discretion to imagine an implementation for one such object). This defeats the purpose of the union itself if what the programmer wants is to save memory and that the size of the type descriptor is not negligible with respect to that of the union. But let's suppose that it's crucial that the union instance could be passed as an argument or as a return value with the callee or caller not knowing what's inside.

Then the programmer has to write a switch control flow statement to tell Bruce Wayne apart from a wooden stick, or something equivalent. It's not too bad when there are only two types of contents in the union but obviously, the union doesn't scale anymore.

Use case 3:

As the authors of a recommendation for the ISO C++ Standard put it back in 2008,

Many important problem domains require either large numbers of objects or limited memory resources. In these situations conserving space is very important, and a union is often a perfect way to do that. In fact, a common use case is the situation where a union never changes its active member during its lifetime. It can be constructed, copied, and destructed as if it were a struct containing only one member. A typical application of this would be to create a heterogeneous collection of unrelated types which are not dynamically allocated (perhaps they are in-place constructed in a map, or members of an array).

And now, an example, with a UML class diagram:

many compositions for class A

The situation in plain English: an object of class A can have objects of any class among B1, ..., Bn, and at most one of each type, with n being a pretty big number, say at least 10.

We don't want to add fields (data members) to A like so:

private:
    B1 b1;
    .
    .
    .
    Bn bn;

because n might vary (we might want to add Bx classes to the mix), and because this would cause a mess with constructors and because A objects would take up a lot of space.

We could use a wacky container of void* pointers to Bx objects with casts to retrieve them, but that's fugly and so C-style... but more importantly that would leave us with the lifetimes of many dynamically allocated objects to manage.

Instead, what can be done is this:

union Bee
{
    B1 b1;
    .
    .
    .
    Bn bn;
};

enum BeesTypes { TYPE_B1, ..., TYPE_BN };

class A
{
private:
    std::unordered_map<int, Bee> data; // C++11, otherwise use std::map

public:
    Bee get(int); // the implementation is obvious: get from the unordered map
};

Then, to get the content of a union instance from data, you use a.get(TYPE_B2).b2 and the likes, where a is a class A instance.

This is all the more powerful since unions are unrestricted in C++11. See the document linked to above or this article for details.

Solution 3

One example is in the embedded realm, where each bit of a register may mean something different. For example, a union of an 8-bit integer and a structure with 8 separate 1-bit bitfields allows you to either change one bit or the entire byte.

Solution 4

Herb Sutter wrote in GOTW about six years ago, with emphasis added:

"But don't think that unions are only a holdover from earlier times. Unions are perhaps most useful for saving space by allowing data to overlap, and this is still desirable in C++ and in today's modern world. For example, some of the most advanced C++ standard library implementations in the world now use just this technique for implementing the "small string optimization," a great optimization alternative that reuses the storage inside a string object itself: for large strings, space inside the string object stores the usual pointer to the dynamically allocated buffer and housekeeping information like the size of the buffer; for small strings, the same space is instead reused to store the string contents directly and completely avoid any dynamic memory allocation. For more about the small string optimization (and other string optimizations and pessimizations in considerable depth), see... ."

And for a less useful example, see the long but inconclusive question gcc, strict-aliasing, and casting through a union.

Solution 5

Some uses for unions:

  • Provide a general endianness interface to an unknown external host.
  • Manipulate foreign CPU architecture floating point data, such as accepting VAX G_FLOATS from a network link and converting them to IEEE 754 long reals for processing.
  • Provide straightforward bit twiddling access to a higher-level type.
union {
      unsigned char   byte_v[16];
      long double     ld_v;
 }

With this declaration, it is simple to display the hex byte values of a long double, change the exponent's sign, determine if it is a denormal value, or implement long double arithmetic for a CPU which does not support it, etc.

  • Saving storage space when fields are dependent on certain values:

    class person {  
        string name;  
    
        char gender;   // M = male, F = female, O = other  
        union {  
            date  vasectomized;  // for males  
            int   pregnancies;   // for females  
        } gender_specific_data;
    }
    
  • Grep the include files for use with your compiler. You'll find dozens to hundreds of uses of union:

    [wally@zenetfedora ~]$ cd /usr/include
    [wally@zenetfedora include]$ grep -w union *
    a.out.h:  union
    argp.h:   parsing options, getopt is called with the union of all the argp
    bfd.h:  union
    bfd.h:  union
    bfd.h:union internal_auxent;
    bfd.h:  (bfd *, struct bfd_symbol *, int, union internal_auxent *);
    bfd.h:  union {
    bfd.h:  /* The value of the symbol.  This really should be a union of a
    bfd.h:  union
    bfd.h:  union
    bfdlink.h:  /* A union of information depending upon the type.  */
    bfdlink.h:  union
    bfdlink.h:       this field.  This field is present in all of the union element
    bfdlink.h:       the union; this structure is a major space user in the
    bfdlink.h:  union
    bfdlink.h:  union
    curses.h:    union
    db_cxx.h:// 4201: nameless struct/union
    elf.h:  union
    elf.h:  union
    elf.h:  union
    elf.h:  union
    elf.h:typedef union
    _G_config.h:typedef union
    gcrypt.h:  union
    gcrypt.h:    union
    gcrypt.h:    union
    gmp-i386.h:  union {
    ieee754.h:union ieee754_float
    ieee754.h:union ieee754_double
    ieee754.h:union ieee854_long_double
    ifaddrs.h:  union
    jpeglib.h:  union {
    ldap.h: union mod_vals_u {
    ncurses.h:    union
    newt.h:    union {
    obstack.h:  union
    pi-file.h:  union {
    resolv.h:   union {
    signal.h:extern int sigqueue (__pid_t __pid, int __sig, __const union sigval __val)
    stdlib.h:/* Lots of hair to allow traditional BSD use of `union wait'
    stdlib.h:  (__extension__ (((union { __typeof(status) __in; int __i; }) \
    stdlib.h:/* This is the type of the argument to `wait'.  The funky union
    stdlib.h:   causes redeclarations with either `int *' or `union wait *' to be
    stdlib.h:typedef union
    stdlib.h:    union wait *__uptr;
    stdlib.h:  } __WAIT_STATUS __attribute__ ((__transparent_union__));
    thread_db.h:  union
    thread_db.h:  union
    tiffio.h:   union {
    wchar.h:  union
    xf86drm.h:typedef union _drmVBlank {
    
Share:
83,367
Russel
Author by

Russel

Updated on March 18, 2020

Comments

  • Russel
    Russel over 4 years

    I have learned but don't really get unions. Every C or C++ text I go through introduces them (sometimes in passing), but they tend to give very few practical examples of why or where to use them. When would unions be useful in a modern (or even legacy) case? My only two guesses would be programming microprocessors when you have very limited space to work with, or when you're developing an API (or something similar) and you want to force the end user to have only one instance of several objects/types at one time. Are these two guesses even close to right?

  • thkala
    thkala over 13 years
    This is very common in device drivers as well. A few years back I wrote a lot of code using unions like this for a project. It's not normally recommended, and it can be compiler-specific in some cases, but it works.
  • Admin
    Admin over 13 years
    I thought void* did that ^^
  • Russel
    Russel over 13 years
    Great! In that case, I'm now wondering why the Sdl function wasn't just implemented as a class hierarchy. Is that to make it C compatible and not just C++?
  • vz0
    vz0 over 13 years
    @Russel C++ classes can not be used from a C program, but C structs/unions can be easily accesed from C++ using an 'extern "C"' block.
  • bta
    bta over 13 years
    I wouldn't call that "not recommended". In the embedded space it's often much cleaner and less error-prone than the alternatives, which usually either involve lots of explicit casts and void*s or masks and shifts.
  • Adam Rosenfield
    Adam Rosenfield over 13 years
    This variant pattern is also often used for programming language interpreters, e.g. the definition of struct object in github.com/petermichaux/bootstrap-scheme/blob/v0.21/scheme.c
  • nos
    nos over 13 years
    Keep in mind though, that accessing raw stuff like that isn't standard, and might not work as expected with all compilers.
  • riwalk
    riwalk about 13 years
    Awesome explanation. I always knew what unions were, but never saw a real-world reason of why anyone would be crazy enough to use them :) Thanks for the example.
  • kagali-san
    kagali-san about 13 years
    @Stargazer712, Google's codesearch: google.com/…
  • anxieux
    anxieux about 11 years
    Note, that although this solution works on most of observable platforms, setting values to _x, _y, _z and accessing _coord is an undefined behavior. The main purpose of unions is space preserving. You must access exactly the same union element that you previously set.
  • Mooing Duck
    Mooing Duck almost 11 years
    Also, it's very common to see this used in a way that doesn't guarantee the alignment, which is undefined behavior.
  • Viktor Sehr
    Viktor Sehr over 10 years
    this is how i use it too, althrough I use a std::array forr coords, and some static_asserts
  • wallyk
    wallyk over 10 years
    Tsk tsk! Two downvotes and no explanations. That is disappointing.
  • Alice
    Alice over 10 years
    @user166390 Polymorphism is using the same interface to manipulate multiple types; void* has no interface.
  • Walter
    Walter over 8 years
    This code violates the strict aliasing rules and must not be recommended.
  • Klaus
    Klaus over 8 years
    The example with a person which can hold a man and a woman is very bad design in my eyes. Why not a person base class and a man and woman derived one? Sorry, but manually looking for a variable to determine the stored type in a a data field is bad idea at all. This is handcrafted c code never seen for years. But no downvote, it is only my point of view :-)
  • Michael
    Michael over 8 years
    heh? Lots of explicit casts? Seems to me simple statements like REG |= MASK and REG &= ~MASK. If that is error prone then put them in a #define SETBITS(reg, mask) and #define CLRBITS(reg, mask). Don't rely on the compiler to get the bits in a certain order (stackoverflow.com/questions/1490092/…)
  • akaltar
    akaltar almost 8 years
    I guess you got the downvotes for the "castrated" or "pregnancies" union. It is a bit sick.
  • Andrew
    Andrew almost 8 years
    This may be useful when making a dynamic language. The problem I think it will solve is modifying a variable of unknown type in mass without implementing that modification N times. Macros are horrendous for this and templating is virtually impossible.
  • Andrew
    Andrew almost 8 years
    This was very helpful, and that second article's series was very informative. Thanks.
  • Andrew
    Andrew almost 8 years
    Is there perhaps a way to improve the union such that it would be reliable to do this?
  • Andrew
    Andrew almost 8 years
    This is no longer true in more recent versions of c++. See jrsala's answer, for instance.
  • Matthieu M.
    Matthieu M. almost 8 years
    @Andrew: I updated the answer to mention that C++11, with unrestricted unions, allowed types with destructors to be stored in union. I still stand by my stance that you really are much better off using tagged unions such as boost::variant than to try to use unions on their own. There's way too much undefined behavior surrounding unions that your chances of getting it right are abysmal.
  • Lundin
    Lundin over 7 years
    In C, polymorphism is commonly implemented through opaque types and/or function pointers. I have no idea how or why you would use an union to achieve that. It sounds like a genuinely bad idea.
  • wallyk
    wallyk about 6 years
    Yeah, I guess it was a dark day.