Types of endianness

12,787

Solution 1

There are two approaches to endian mapping: address invariance and data invariance.

Address Invariance

In this type of mapping, the address of bytes is always preserved between big and little. This has the side effect of reversing the order of significance (most significant to least significant) of a particular datum (e.g. 2 or 4 byte word) and therefore the interpretation of data. Specifically, in little-endian, the interpretation of data is least-significant to most-significant bytes whilst in big-endian, the interpretation is most-significant to least-significant. In both cases, the set of bytes accessed remains the same.

Example

Address invariance (also known as byte invariance): the byte address is constant but byte significance is reversed.

Addr   Memory
       7    0
       |    |    (LE)   (BE)
       |----|
 +0    | aa |    lsb    msb
       |----|
 +1    | bb |     :      :
       |----|
 +2    | cc |     :      :
       |----|
 +3    | dd |    msb    lsb
       |----|
       |    |
At Addr=0:          Little-endian          Big-endian
Read 1 byte:              0xaa                0xaa   (preserved)
Read 2 bytes:           0xbbaa              0xaabb
Read 4 bytes:       0xddccbbaa          0xaabbccdd

Data Invariance

In this type of mapping, the relative byte significance is preserved for datum of a particular size. There are therefore different types of data invariant endian mappings for different datum sizes. For example, a 32-bit word invariant endian mapping would be used for a datum size of 32. The effect of preserving the value of particular sized datum, is that the byte addresses of bytes within the datum are reversed between big and little endian mappings.

Example

32-bit data invariance (also known as word invariance): The datum is a 32-bit word which always has the value 0xddccbbaa, independent of endianness. However, for accesses smaller than a word, the address of the bytes are reversed between big and little endian mappings.

Addr                Memory
            | +3   +2   +1   +0 |  <- LE
            |-------------------|
+0      msb | dd | cc | bb | aa |  lsb
            |-------------------|
+4      msb | 99 | 88 | 77 | 66 |  lsb
            |-------------------|
     BE ->  | +0   +1   +2   +3 |
At Addr=0:             Little-endian              Big-endian
Read 1 byte:                 0xaa                    0xdd
Read 2 bytes:              0xbbaa                  0xddcc
Read 4 bytes:          0xddccbbaa              0xddccbbaa   (preserved)
Read 8 bytes:  0x99887766ddccbbaa      0x99887766ddccbbaa   (preserved)

Example

16-bit data invariance (also known as half-word invariance): The datum is a 16-bit which always has the value 0xbbaa, independent of endianness. However, for accesses smaller than a half-word, the address of the bytes are reversed between big and little endian mappings.

Addr           Memory
            | +1   +0 |  <- LE
            |---------|
+0      msb | bb | aa |  lsb
            |---------|
+2      msb | dd | cc |  lsb
            |---------|
+4      msb | 77 | 66 |  lsb
            |---------|
+6      msb | 99 | 88 |  lsb
            |---------|
     BE ->  | +0   +1 |
At Addr=0:             Little-endian              Big-endian
Read 1 byte:                 0xaa                    0xbb
Read 2 bytes:              0xbbaa                  0xbbaa   (preserved)
Read 4 bytes:          0xddccbbaa              0xddccbbaa   (preserved)
Read 8 bytes:  0x99887766ddccbbaa      0x99887766ddccbbaa   (preserved)

Example

64-bit data invariance (also known as double-word invariance): The datum is a 64-bit word which always has the value 0x99887766ddccbbaa, independent of endianness. However, for accesses smaller than a double-word, the address of the bytes are reversed between big and little endian mappings.

Addr                         Memory
            | +7   +6   +5   +4   +3   +2   +1   +0 |  <- LE
            |---------------------------------------|
+0      msb | 99 | 88 | 77 | 66 | dd | cc | bb | aa |  lsb
            |---------------------------------------|
     BE ->  | +0   +1   +2   +3   +4   +5   +6   +7 |
At Addr=0:             Little-endian              Big-endian
Read 1 byte:                 0xaa                    0x99
Read 2 bytes:              0xbbaa                  0x9988
Read 4 bytes:          0xddccbbaa              0x99887766
Read 8 bytes:  0x99887766ddccbbaa      0x99887766ddccbbaa   (preserved)

Solution 2

Philibert said,

bits were actually inverted

I doubt any architecture would break byte value invariance. The order of bit-fields may need inversion when mapping structs containing them against data. Such direct mapping relies on compiler specifics which are outside the C99 standard but which may still be common. Direct mapping is faster but does not comply with the C99 standard that does not stipulate packing, alignment and byte order. C99-compliant code should use slow mapping based on values rather than addresses. That is, instead of doing this,

#if LITTLE_ENDIAN
  struct breakdown_t {
    int least_significant_bit: 1;
    int middle_bits: 10;
    int most_significant_bits: 21;
  };
#elif BIG_ENDIAN
  struct breakdown_t {
    int most_significant_bits: 21;
    int middle_bits: 10;
    int least_significant_bit: 1;
  };
#else
  #error Huh
#endif
uint32_t data = ...;
struct breakdown_t *b = (struct breakdown_t *)&data;

one should write this (and this is how the compiler would generate code anyways even for the above "direct mapping"),

uint32_t data = ...;
uint32_t least_significant_bit = data & 0x00000001;
uint32_t middle_bits = (data >> 1) & 0x000003FF;
uint32_t most_significant_bits = (data >> 11) & 0x001fffff;

The reason behind the need to invert the order of bit-fields in each endian-neutral, application-specific data storage unit is that compilers pack bit-fields into bytes of growing addresses.

The "order of bits" in each byte does not matter as the only way to extract them is by applying masks of values and by shifting to the the least-significant-bit or most-significant-bit direction. The "order of bits" issue would only become important in imaginary architectures with the notion of bit addresses. I believe all existing architectures hide this notion in hardware and provide only least vs. most significant bit extraction which is the notion based on the endian-neutral byte values.

Solution 3

There's also middle or mixed endian. See wikipedia for details.

The only time I had to worry about this was when writing some networking code in C. Networking typically uses big-endian IIRC. Most languages either abstract the whole thing or offer libraries to guarantee that you're using the right endian-ness though.

Solution 4

Best article I read about endianness "Understanding Big and Little Endian Byte Order".

Solution 5

Actually, I'd describe the endianness of a machine as the order of bytes inside of a word, and not the order of bits.

By "bytes" up there I mean the "smallest unit of memory the architecture can manage individually". So, if the smallest unit is 16 bits long (what in x86 would be called a word) then a 32 bit "word" representing the value 0xFFFF0000 could be stored like this:

FFFF 0000

or this:

0000 FFFF

in memory, depending on endianness.

So, if you have 8-bit endianness, it means that every word consisting of 16 bits, will be stored as:

FF 00

or:

00 FF

and so on.

Share:
12,787
Author by

Nitin

C/C++/Java developer. Currently working on embedded systems virtualization technology.

Updated on June 05, 2022

Comments

  • Nitin over 1 year

    What is the difference between the following types of endianness?

    • byte (8b) invariant big and little endianness
    • half-word (16b) invariant big and little endianness
    • word (32b) invariant big and little endianness
    • double-word (64b) invariant big and little endianness

    Are there other types/variations?