When would anyone use a union? Is it a remnant from the C-only days?

c++ c unions

83,367

Solution 1

Unions are usually used with the company of a discriminator: a variable indicating which of the fields of the union is valid. For example, let's say you want to create your own Variant type:

struct my_variant_t {
    int type;
    union {
        char char_value;
        short short_value;
        int int_value;
        long long_value;
        float float_value;
        double double_value;
        void* ptr_value;
    };
};

Then you would use it such as:

/* construct a new float variant instance */
void init_float(struct my_variant_t* v, float initial_value) {
    v->type = VAR_FLOAT;
    v->float_value = initial_value;
}

/* Increments the value of the variant by the given int */
void inc_variant_by_int(struct my_variant_t* v, int n) {
    switch (v->type) {
    case VAR_FLOAT:
        v->float_value += n;
        break;

    case VAR_INT:
        v->int_value += n;
        break;
    ...
    }
}

This is actually a pretty common idiom, specially on Visual Basic internals.

For a real example see SDL's SDL_Event union. (actual source code here). There is a type field at the top of the union, and the same field is repeated on every SDL_*Event struct. Then, to handle the correct event you need to check the value of the type field.

The benefits are simple: there is one single data type to handle all event types without using unnecessary memory.

Solution 2

I find C++ unions pretty cool. It seems that people usually only think of the use case where one wants to change the value of a union instance "in place" (which, it seems, serves only to save memory or perform doubtful conversions).

In fact, unions can be of great power as a software engineering tool, even when you never change the value of any union instance.

Use case 1: the chameleon

With unions, you can regroup a number of arbitrary classes under one denomination, which isn't without similarities with the case of a base class and its derived classes. What changes, however, is what you can and can't do with a given union instance:

struct Batman;
struct BaseballBat;

union Bat
{
    Batman brucewayne;
    BaseballBat club;
};

ReturnType1 f(void)
{
    BaseballBat bb = {/* */};
    Bat b;
    b.club = bb;
    // do something with b.club
}

ReturnType2 g(Bat& b)
{
    // do something with b, but how do we know what's inside?
}

Bat returnsBat(void);
ReturnType3 h(void)
{
    Bat b = returnsBat();
    // do something with b, but how do we know what's inside?
}

It appears that the programmer has to be certain of the type of the content of a given union instance when he wants to use it. It is the case in function f above. However, if a function were to receive a union instance as a passed argument, as is the case with g above, then it wouldn't know what to do with it. The same applies to functions returning a union instance, see h: how does the caller know what's inside?

If a union instance never gets passed as an argument or as a return value, then it's bound to have a very monotonous life, with spikes of excitement when the programmer chooses to change its content:

Batman bm = {/* */};
Baseball bb = {/* */};
Bat b;
b.brucewayne = bm;
// stuff
b.club = bb;

And that's the most (un)popular use case of unions. Another use case is when a union instance comes along with something that tells you its type.

Use case 2: "Nice to meet you, I'm `object`, from `Class`"

Suppose a programmer elected to always pair up a union instance with a type descriptor (I'll leave it to the reader's discretion to imagine an implementation for one such object). This defeats the purpose of the union itself if what the programmer wants is to save memory and that the size of the type descriptor is not negligible with respect to that of the union. But let's suppose that it's crucial that the union instance could be passed as an argument or as a return value with the callee or caller not knowing what's inside.

Then the programmer has to write a switch control flow statement to tell Bruce Wayne apart from a wooden stick, or something equivalent. It's not too bad when there are only two types of contents in the union but obviously, the union doesn't scale anymore.

Use case 3:

As the authors of a recommendation for the ISO C++ Standard put it back in 2008,

Many important problem domains require either large numbers of objects or limited memory resources. In these situations conserving space is very important, and a union is often a perfect way to do that. In fact, a common use case is the situation where a union never changes its active member during its lifetime. It can be constructed, copied, and destructed as if it were a struct containing only one member. A typical application of this would be to create a heterogeneous collection of unrelated types which are not dynamically allocated (perhaps they are in-place constructed in a map, or members of an array).

And now, an example, with a UML class diagram:

many compositions for class A

The situation in plain English: an object of class A can have objects of any class among B1, ..., Bn, and at most one of each type, with n being a pretty big number, say at least 10.

We don't want to add fields (data members) to A like so:

private:
    B1 b1;
    .
    .
    .
    Bn bn;

because n might vary (we might want to add Bx classes to the mix), and because this would cause a mess with constructors and because A objects would take up a lot of space.

We could use a wacky container of void* pointers to Bx objects with casts to retrieve them, but that's fugly and so C-style... but more importantly that would leave us with the lifetimes of many dynamically allocated objects to manage.

Instead, what can be done is this:

union Bee
{
    B1 b1;
    .
    .
    .
    Bn bn;
};

enum BeesTypes { TYPE_B1, ..., TYPE_BN };

class A
{
private:
    std::unordered_map<int, Bee> data; // C++11, otherwise use std::map

public:
    Bee get(int); // the implementation is obvious: get from the unordered map
};

Then, to get the content of a union instance from data, you use a.get(TYPE_B2).b2 and the likes, where a is a class A instance.

This is all the more powerful since unions are unrestricted in C++11. See the document linked to above or this article for details.

Solution 3

One example is in the embedded realm, where each bit of a register may mean something different. For example, a union of an 8-bit integer and a structure with 8 separate 1-bit bitfields allows you to either change one bit or the entire byte.

Solution 4

Herb Sutter wrote in GOTW about six years ago, with emphasis added:

"But don't think that unions are only a holdover from earlier times. Unions are perhaps most useful for saving space by allowing data to overlap, and this is still desirable in C++ and in today's modern world. For example, some of the most advanced C++ standard library implementations in the world now use just this technique for implementing the "small string optimization," a great optimization alternative that reuses the storage inside a string object itself: for large strings, space inside the string object stores the usual pointer to the dynamically allocated buffer and housekeeping information like the size of the buffer; for small strings, the same space is instead reused to store the string contents directly and completely avoid any dynamic memory allocation. For more about the small string optimization (and other string optimizations and pessimizations in considerable depth), see... ."

And for a less useful example, see the long but inconclusive question gcc, strict-aliasing, and casting through a union.

Solution 5

Some uses for unions:

Provide a general endianness interface to an unknown external host.
Manipulate foreign CPU architecture floating point data, such as accepting VAX G_FLOATS from a network link and converting them to IEEE 754 long reals for processing.
Provide straightforward bit twiddling access to a higher-level type.

union {
      unsigned char   byte_v[16];
      long double     ld_v;
 }
With this declaration, it is simple to display the hex byte values of a long double, change the exponent's sign, determine if it is a denormal value, or implement long double arithmetic for a CPU which does not support it, etc.

Saving storage space when fields are dependent on certain values:

class person {  
    string name;  

    char gender;   // M = male, F = female, O = other  
    union {  
        date  vasectomized;  // for males  
        int   pregnancies;   // for females  
    } gender_specific_data;
}

Grep the include files for use with your compiler. You'll find dozens to hundreds of uses of union:

[wally@zenetfedora ~]$ cd /usr/include
[wally@zenetfedora include]$ grep -w union *
a.out.h:  union
argp.h:   parsing options, getopt is called with the union of all the argp
bfd.h:  union
bfd.h:  union
bfd.h:union internal_auxent;
bfd.h:  (bfd *, struct bfd_symbol *, int, union internal_auxent *);
bfd.h:  union {
bfd.h:  /* The value of the symbol.  This really should be a union of a
bfd.h:  union
bfd.h:  union
bfdlink.h:  /* A union of information depending upon the type.  */
bfdlink.h:  union
bfdlink.h:       this field.  This field is present in all of the union element
bfdlink.h:       the union; this structure is a major space user in the
bfdlink.h:  union
bfdlink.h:  union
curses.h:    union
db_cxx.h:// 4201: nameless struct/union
elf.h:  union
elf.h:  union
elf.h:  union
elf.h:  union
elf.h:typedef union
_G_config.h:typedef union
gcrypt.h:  union
gcrypt.h:    union
gcrypt.h:    union
gmp-i386.h:  union {
ieee754.h:union ieee754_float
ieee754.h:union ieee754_double
ieee754.h:union ieee854_long_double
ifaddrs.h:  union
jpeglib.h:  union {
ldap.h: union mod_vals_u {
ncurses.h:    union
newt.h:    union {
obstack.h:  union
pi-file.h:  union {
resolv.h:   union {
signal.h:extern int sigqueue (__pid_t __pid, int __sig, __const union sigval __val)
stdlib.h:/* Lots of hair to allow traditional BSD use of `union wait'
stdlib.h:  (__extension__ (((union { __typeof(status) __in; int __i; }) \
stdlib.h:/* This is the type of the argument to `wait'.  The funky union
stdlib.h:   causes redeclarations with either `int *' or `union wait *' to be
stdlib.h:typedef union
stdlib.h:    union wait *__uptr;
stdlib.h:  } __WAIT_STATUS __attribute__ ((__transparent_union__));
thread_db.h:  union
thread_db.h:  union
tiffio.h:   union {
wchar.h:  union
xf86drm.h:typedef union _drmVBlank {

View more solutions

83,367

Author by

Russel

Updated on March 18, 2020

Comments

Russel over 4 years

I have learned but don't really get unions. Every C or C++ text I go through introduces them (sometimes in passing), but they tend to give very few practical examples of why or where to use them. When would unions be useful in a modern (or even legacy) case? My only two guesses would be programming microprocessors when you have very limited space to work with, or when you're developing an API (or something similar) and you want to force the end user to have only one instance of several objects/types at one time. Are these two guesses even close to right?
thkala over 13 years

This is very common in device drivers as well. A few years back I wrote a lot of code using unions like this for a project. It's not normally recommended, and it can be compiler-specific in some cases, but it works.
Admin over 13 years

I thought void* did that ^^
Russel over 13 years

Great! In that case, I'm now wondering why the Sdl function wasn't just implemented as a class hierarchy. Is that to make it C compatible and not just C++?
vz0 over 13 years

@Russel C++ classes can not be used from a C program, but C structs/unions can be easily accesed from C++ using an 'extern "C"' block.
bta over 13 years

I wouldn't call that "not recommended". In the embedded space it's often much cleaner and less error-prone than the alternatives, which usually either involve lots of explicit casts and void*s or masks and shifts.
Adam Rosenfield over 13 years

This variant pattern is also often used for programming language interpreters, e.g. the definition of struct object in github.com/petermichaux/bootstrap-scheme/blob/v0.21/scheme.c
nos over 13 years

Keep in mind though, that accessing raw stuff like that isn't standard, and might not work as expected with all compilers.
riwalk about 13 years

Awesome explanation. I always knew what unions were, but never saw a real-world reason of why anyone would be crazy enough to use them :) Thanks for the example.
kagali-san about 13 years

@Stargazer712, Google's codesearch: google.com/…
anxieux about 11 years

Note, that although this solution works on most of observable platforms, setting values to _x, _y, _z and accessing _coord is an undefined behavior. The main purpose of unions is space preserving. You must access exactly the same union element that you previously set.
Mooing Duck almost 11 years

Also, it's very common to see this used in a way that doesn't guarantee the alignment, which is undefined behavior.
Viktor Sehr over 10 years

this is how i use it too, althrough I use a std::array forr coords, and some static_asserts
wallyk over 10 years

Tsk tsk! Two downvotes and no explanations. That is disappointing.
Alice over 10 years

@user166390 Polymorphism is using the same interface to manipulate multiple types; void* has no interface.
Walter over 8 years

This code violates the strict aliasing rules and must not be recommended.
Klaus over 8 years

The example with a person which can hold a man and a woman is very bad design in my eyes. Why not a person base class and a man and woman derived one? Sorry, but manually looking for a variable to determine the stored type in a a data field is bad idea at all. This is handcrafted c code never seen for years. But no downvote, it is only my point of view :-)
Michael over 8 years

heh? Lots of explicit casts? Seems to me simple statements like REG |= MASK and REG &= ~MASK. If that is error prone then put them in a #define SETBITS(reg, mask) and #define CLRBITS(reg, mask). Don't rely on the compiler to get the bits in a certain order (stackoverflow.com/questions/1490092/…)
akaltar almost 8 years

I guess you got the downvotes for the "castrated" or "pregnancies" union. It is a bit sick.
Andrew almost 8 years

This may be useful when making a dynamic language. The problem I think it will solve is modifying a variable of unknown type in mass without implementing that modification N times. Macros are horrendous for this and templating is virtually impossible.
Andrew almost 8 years

This was very helpful, and that second article's series was very informative. Thanks.
Andrew almost 8 years

Is there perhaps a way to improve the union such that it would be reliable to do this?
Andrew almost 8 years

This is no longer true in more recent versions of c++. See jrsala's answer, for instance.
Matthieu M. almost 8 years

@Andrew: I updated the answer to mention that C++11, with unrestricted unions, allowed types with destructors to be stored in union. I still stand by my stance that you really are much better off using tagged unions such as boost::variant than to try to use unions on their own. There's way too much undefined behavior surrounding unions that your chances of getting it right are abysmal.
Lundin over 7 years

In C, polymorphism is commonly implemented through opaque types and/or function pointers. I have no idea how or why you would use an union to achieve that. It sounds like a genuinely bad idea.
wallyk about 6 years

Yeah, I guess it was a dark day.