Define smallest possible datatype in c++ that can hold six values

10,525

Solution 1

A char is the smallest possible type.

If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:

struct foo {
  unsigned int val1:3;
  unsigned int val2:3;
};

and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.

Solution 2

C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.

As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.

Solution 3

C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.

Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.

If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.

This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.

Solution 4

You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.

For example:

struct example
{
    unsigned int a : 3; //<Three bits, can be 0 through 7.
            bool b : 1; //<One bit, the stores 0 or 1.
    unsigned int c : 10; //<Ten bits, can be 0 through 1023.
    unsigned int d : 19; //<19 bits, can be 0 through 524287.
}

In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.

Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.

Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:

//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;

To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:

//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);

See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).

Then to decode:

const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;

Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.

Solution 5

Minimal size what you can use - 1 byte.

But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.

Share:
10,525
Mizipzor
Author by

Mizipzor

A crazy coder in a passionate hunt for greater wisdom. I take great interest in anything involving math and algorithms. Especially path finding, artificial life, cellular automata and emergent behavior.

Updated on June 07, 2022

Comments

  • Mizipzor
    Mizipzor about 2 years

    I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.

    Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.

    #include <iostream>
    
    enum Foo
    {
        a = 0x0, 
        b = 0x1,
        c = 0x2,
        d = 0x3,
        e = 0x4,
        f = 0x5,
    };
    
    int main()
    {
        Foo myfoo = a;
        char mychar = 'a';
    
        std::cout << sizeof(myfoo); // prints 4
        std::cout << sizeof(mychar); // prints 1
    
        return 1;
    }
    

    Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)

    Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.

  • Mizipzor
    Mizipzor about 15 years
    What does the : do and what is it called? Seems I should do some reading on that. Does it split the int into three equally sized parts that I can access and manipulate individually?
  • Bastien Léonard
    Bastien Léonard about 15 years
  • Alnitak
    Alnitak about 15 years
    See en.wikipedia.org/wiki/Bit_field. In this case, the :3 says "reserve 3 bits for this value", that being sufficient for 0-7
  • Mizipzor
    Mizipzor about 15 years
    Okey. But reserving 3 bits shouldnt make the int smaller? If it does get smaller, it shouldnt become smaller than a char. Or is that just a way of making sure "val1" never has a value higher than 7?
  • Admin
    Admin about 15 years
    there are a number of reasons for not using bitfields - poor performance, lack of portability, and the fact you can't take a bitfield's address being the three main ones
  • Alnitak
    Alnitak about 15 years
    no, an int can never become smaller than its standard size. this is just special syntax, designed for low-level memory structures. see the two other links for more info.
  • Alnitak
    Alnitak about 15 years
    @Neil - sure, but there's no other way to minimize the memory use so efficiently other than packing your data manually, and then you still get the same limitations.
  • Admin
    Admin about 15 years
    A page-size of 16 bytes would be unworkably small - typical page size is 2 or 4 Kb. Perhaps you were thinking of paragraphs?
  • zvrba
    zvrba about 15 years
    No, bitfields are not portable because the layout (whether fields are allocated from most- or least-significant bit first) is implementation-defined.
  • Alnitak
    Alnitak about 15 years
    that doesn't mean that the code isn't portable, but it does matter if you're trying to map your struct to an external chunk of data and vice-versa.
  • Admin
    Admin about 15 years
    Alnitak, yep. I wanted to make it clear that the code is portable because there was potential for confusion.
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    if your code depends on that they are allocated into the same integer unit, your code is non-portable. if it doesn't depend on that (and other things implementation defined), your code is portable.
  • Imbue
    Imbue about 15 years
    Err, bit fields are no less portable than int test = 1234567; That will only work on like hardware (same int size, same endianness) too.
  • SingleNegationElimination
    SingleNegationElimination about 15 years
    The pages returned by the OS are divided for small allocations by most allocators. Even so, few allocators will produce anything as small as a machine word, much less a byte, because if there are many allocations on a page, it becomes difficult to swap it out, free it, and alloc into it.
  • Mizipzor
    Mizipzor about 15 years
    I intend to use this for a bruteforcing rubiks cube solver. Although I doubt the computer to have enough memory for the bruteforce approach, It will help greatly if each state of the cube can be represented with as little memory as possible, even if it is at the expense of performance.