How to specify enum size in GCC?
Solution 1
You've misinterpreted the warning, what it's saying is that character literals are always of type int
, never of type long
or long long
.
You can get away with something like this:
enum foo {
garbage1 = (long long)'1' << 32 | (long long)'2' << 24 | (long long)'3' << 16 | (long long)'4' << 8 | (long long)'5',
garbage2
};
But you still have to be careful about using little endian if you want to use the strncpy solution, make sure to use the correct shift numbers.
Solution 2
As Matteo Italia's answer says, gcc lets you define a 64-bit enumeration type by specifying a 64-bit value for one of the members. For example:
enum some_enum {
/* ... */
max = 0x7fffffffffffffff
};
As for your use of 'mov'
, 'cmp'
, and so forth, there is no necessary correlation between the representation of a string literal like "mov"
and the representation of a multi-character character constant like 'mov'
.
The latter is legal (and supported by gcc), but the value is implementation-defined. The standard says that the type is always int
, and gcc doesn't seem to have an extension that lets you override that. So if int
is 4 bytes, then 'sysenter'
, if it's accepted at all, won't necessarily have the value you're looking for. gcc seems to ignore all but the low-order bytes of such a constant. The value of the constant seems to be consistent across big-endian and little-endian systems -- which means that it won't consistently match the representation of a similar string literal.
For example, this program:
#include <stdio.h>
int main(void) {
const char *s1 = "abcd";
const char *s2 = "abcdefgh";
printf("'abcd' = 0x%x\n", (unsigned)'abcd');
printf("'abcdefgh' = 0x%x\n", (unsigned)'abcdefgh');
printf("*(unsigned*)s1 = 0x%x\n", *(unsigned*)s1);
printf("*(unsigned*)s2 = 0x%x\n", *(unsigned*)s2);
return 0;
}
produces this output when compiled with gcc on a little-endian system (x86):
'abcd' = 0x61626364
'abcdefgh' = 0x65666768
*(unsigned*)s1 = 0x64636261
*(unsigned*)s2 = 0x64636261
and this output on a big-endian system (SPARC):
'abcd' = 0x61626364
'abcdefgh' = 0x65666768
*(unsigned*)s1 = 0x61626364
*(unsigned*)s2 = 0x61626364
So I'm afraid your idea of matching character constants like 'mov'
against strings like "mov"
isn't going to work. (Conceivably you could normalize the string representations to big-endian, but I wouldn't take that approach myself.)
The problem you're trying to solve is quickly mapping strings like "mov"
to specific integer values that represent CPU instructions. You're right that a long sequence of strcmp()
calls is going to be inefficient (have you actually measured it and found that the speed is unacceptable?) -- but there are better ways. A hash table of some sort is probably the best. There are tools to generate perfect hash functions, so that a relatively cheap computation on the value of the string gives you a unique integer value.
You won't be able to write the definitions of your enumeration values quite as conveniently, but once you have the right hash function you can write a program to generate the C source code for the enum type.
That's assuming that an enum is the best approach here; it might not be. If I were doing this, the central data structure would be a collection of structs, where each one contains the string name of the operator and whatever other information is associated with it. The hash function would map strings like "mov"
to indices in this collection. (I'm being deliberately vague about what kind of "collection" to use; with the right hash function, it might be a simple array.) With this kind of solution, I don't think the 64-bit enum type is needed.
Solution 3
You could use an union
type:
union some {
enum { garbage1, garbage2 } a;
int64_t dummy;
};
Solution 4
Although the C99 standard specifies that an enum cannot be based on anything but an 1, it seems that int
(§6.7.2.2 ¶2)gcc
follows the C++ idea that, if a value in an enum
is bigger than an int
, it can base it on a bigger integer type. I don't have any problem with this code, neither on x86 neither on x64:
enum myEnum
{
a=1234567891234567890LL
};
int main()
{
enum myEnum e;
printf("%u %u", sizeof(void *), sizeof(e));
return 0;
}
on x86 I get
4 8
and on x64 (on my machine) I get
8 8
Although, asking for pedantic respect of the standard, I get, as expected:
[email protected]:~/cpp$ gcc -ansi -pedantic testenum.c
testenum.c:5:7: warning: use of C99 long long integer constant
testenum.c:5: warning: ISO C restricts enumerator values to range of ‘int’
-
Actually, it's a bit more complicated; ¶4 specifies that the implementation is free to choose as "base type" any particular type that is "compatible with
char
, a signed integer type or an unsigned integer type", as long as it can represent all the elements of theenum
.On the other hand, ¶2 specifies that each member of the
enum
must be representable asint
, so, even if the implementation is free to base yourenum
even on a gazillion bit integer, the constants defined for it cannot be anything that can't be represented by anint
. Thus, this means that in practice the compiler won't base theenum
on anything bigger than anint
, but it may base it on something smaller if your values don't require the full range ofint
.
Thanks to @jons34yp for pointing out my initial mistake.
Solution 5
Per Johansson hit the nail on the head with his answer here. As a concrete example of how to use this technique, I wrote up this program (insn_enum.c
):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <inttypes.h>
enum insn {
/*
* Have the characters backwards because C treats the value as an
* integer (of size 64 bits in this case). There is no need for
* a null terminator since we are treating the values as an integer,
* not a string.
*/
sysenter = (uint64_t)'r' << 56 | (uint64_t)'e' << 48 |
(uint64_t)'t' << 40 | (uint64_t)'n' << 32 |
(uint64_t)'e' << 24 | (uint64_t)'s' << 16 |
(uint64_t)'y' << 8 | (uint64_t)'s',
};
int main(void)
{
enum insn some_insn = sysenter;
char * insn = "sysenter";
uint64_t val = 0;
/*
* We can optimise this by traversing backwards (little endian) setting
* 0 till a NULL char is found, although I will not bother implementing
* this till I have done some profiling.
*/
strncpy((char * )&val, insn, sizeof(uint64_t));
printf("size: %" PRIuPTR"\n", sizeof(enum insn));
if(some_insn == val) {
puts("Works");
} else {
puts("Doesn't work");
}
return EXIT_SUCCESS;
}
This can be compiled with the following makefile
:
all:
gcc -std=gnu99 -m32 -Wall insn_enum.c -o insn_enum_32
gcc -std=gnu99 -m64 -Wall insn_enum.c -o insn_enum_64
clean:
rm -f insn_enum_32
rm -f insn_enum_64
Running with ./insn_enum_32 && ./insn_enum_64
will print:
size: 8
Works
size: 8
Works
It should be noted that this only shows that we can get this trick working on x86-32 and x86-64 (the only two platforms I intend to target). In fact, this trick is guaranteed not to work on big endian systems because of how the language treats the enum
as an integer value. Also I am not sure that we can guarantee that the compiler will necessarily use uint64_t
as the size of the enum
even if we specify it the way we have. Indeed compiling with -pedantic
will give the warning:
gcc -std=gnu99 -m32 -pedantic -Wall insn_enum.c -o insn_enum_32
insn_enum.c:13:13: warning: ISO C restricts enumerator values to range of ‘int’
gcc -std=gnu99 -m64 -pedantic -Wall insn_enum.c -o insn_enum_64
insn_enum.c:13:13: warning: ISO C restricts enumerator values to range of ‘int’

Mike Kwan
Updated on July 26, 2022Comments
-
Mike Kwan 5 months
I want to specify an enumeration size of 64 bits for an enumeration. How is this possible through GCC? The code does not need to be 'portable' in that I am only interested in making the code work on GCC compiling for x86-32 and x86-64 Linux. That means any hack which can provide the functionality I want is fine as long as it works for those targets.
Given this code:
#include <stdlib.h> #include <stdio.h> enum some_enum { garbage1, garbage2 }; int main(void) { enum some_enum some_val; printf("size: %lu\n", sizeof(some_val)); return EXIT_SUCCESS; }
This currently prints out 4, whereas I want to be able to force the size to be 8. Attempting to specify values in the enum assignment larger than 4 bytes cause a warning. For example,
enum some_enum { garbage1 = '12345', garbage2 };
Would produce:
warning: character constant too long for its type [enabled by default]
An answer to a similar question here doesn't seem to yield any good results. That is, the same warning is produced as a result of:
enum some_enum { garbage1 = 'adfs', garbage2 = 'asdfasdf' };
Note: the multi-character warning can be turned off by compiling with
-Wno-multichar
.
Rationale
Since people are interested in why I am doing this, I have written a disassembler engine. I get each part of an instruction as a string. So I want the enumeration to look like this:
enum mnemonic { mov = 'mov', cmp = 'cmp', sysenter = 'sysenter' };
I can then store semantic information easily with some code like this:
enum mnemonic insn; char * example_insn = "mov"; uint64_t buf = 0; strncpy((char *)&buf, example_insn, sizeof(uint64_t));
If
buf
were anenum mnemonic
then we need to do nothing else. Thestrncpy
is used to pad the bytes after the end of the string to null characters. If I am not able to do this, I would have to do something like this instead:if(strcmp(example_insn, "mov") == 0) { insn = mov; } else if(strcmp(example_insn, "cmp") == 0) { insn = cmp; } ...
Since this routine is going to be hit millions of times, this sort of optimisation would make a huge difference. I intend to do the same for operands such as registers too.
-
Mike Kwan almost 11 years
#define
is my last resort. I would prefer an enumeration. Your suggestion for enumeration is not quite what I want. I need it to represent a string. -
Mike Kwan almost 11 yearsDespite that warning, is this solution guaranteed to work as expected with my target platforms? Also, is it possible to adapt this to the string/multi-char literal style I want? I guess I could always just generate those numbers, but it's less readable.
-
Matteo Italia almost 11 years@MikeKwan: that warning pops up only if you explicitly ask for pedantic standard conformity (
-ansi -pedantic
), as far asgcc
is concerned it has no problems in basing theenum
s on types bigger thanint
if it's necessary (as you can see I tested it on both x86 and x64 and it works fine). As far as multi-char literals, I'm trying to look if there's some way to make them work. -
Keith Thompson almost 11 yearsIf you specifically want 64-bit enums, you can do this:
enum some_enum { /* ... */ max = 0x7fffffffffffffff };
. Having a member with that value guarantees that the type requires (at least) 64 bits (or that the compiler will reject it). -
Matteo Italia almost 11 years@KeithThompson:
0x7fffffffffffffffLL
, I think. As for the multicharacter literals problem, I think that there's no solution. -
Keith Thompson almost 11 years@MatteoItalia: The
LL
suffix isn't necessary; a hex constant is of a type big enough to hold its value. And I think you're right about there being no solution for the multicharacter literals -- other than not using them (see my answer). -
Mike Kwan almost 11 yearsThis is a nice solution. I will try it out later. Do you know if this can be converted to a macro somehow?
-
Mike Kwan almost 11 yearsRegarding whether this could be done with a hashtable faster, I genuinely believe not. I am using the hashtable from the Linux kernel right now. That takes a key and a size of a key. So that means I would actually need to do the padding that I am doing currently anyway. The reason is that I actually simplified the rationale - the strings generated from libopcodes, which I am using to disassemble have space characters padding which needs to be stripped. Using a hashtable would use exactly the same amount of computation but also add the cost of calculating the hash and adding to the hashtable.
-
Mike Kwan almost 11 yearsThis solution works perfectly. I'm going to accept this answer and do a short writeup demonstrating how to use it.
-
Mike Kwan almost 11 yearsAfter some investigation, I agree the actual problem is that a multi-character literal can only be 4 bytes (at least under x86-32 and x86-64). I can't find a citation for this, however. The solution by Per Johansson works nicely instead.
-
Admin over 10 years@MatteoItalia. Your answer is wrong. §6.7.2.2 ¶2 talks only about enumeration constants (the
a = 12345
bit), whereas §6.7.2.2 ¶4 says "Each enumerated type <...>. The choice of type is implementation-defined," -
Matteo Italia over 10 years@jons34yp: you're right, I'll correct the answer; still, since each value of the
enum
can't be larger than anint
(because of ¶2) it wouldn't make sense to make it any bigger thanint
. -
EthanL over 1 yearThis is a good suggestion although it might only work for GCC compiler.
-
Dmytro Kryvyi about 1 yearYes, the question was asked for GCC. The attributes are specific for a different compilers.