How to dump a binary file as a C/C++ string literal?

39,669

Solution 1

You can almost do what you want with hexdump, but I can't figure out how to get quotes & single backslashes into the format string. So I do a little post-processing with sed. As a bonus, I've also indented each line by 4 spaces. :)

hexdump -e '16/1 "_x%02X" "\n"' filename | sed 's/_/\\/g; s/.*/    "&"/'

Edit

As Cengiz Can pointed out, the above command line doesn't cope well with short data lines. So here's a new improved version:

hexdump -e '16/1 "_x%02X" "\n"' filename | sed 's/_/\\/g; s/\\x  //g; s/.*/    "&"/'

As Malvineous mentions in the comments, we also need to pass the -v verbose option to hexdump to prevent it from abbreviating long runs of identical bytes to *.

hexdump -v -e '16/1 "_x%02X" "\n"' filename | sed 's/_/\\/g; s/\\x  //g; s/.*/    "&"/'

Solution 2

xxd has a mode for this. The -i/--include option will:

output in C include file style. A complete static array definition is written (named after the input file), unless xxd reads from stdin.

You can dump that into a file to be #included, and then just access foo like any other character array (or link it in). It also includes a declaration of the length of the array.

The output is wrapped to 80 bytes and looks essentially like what you might write by hand:

$ xxd --include foo
unsigned char foo[] = {
  0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x2c, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64,
  0x21, 0x0a, 0x0a, 0x59, 0x6f, 0x75, 0x27, 0x72, 0x65, 0x20, 0x76, 0x65,
  0x72, 0x79, 0x20, 0x63, 0x75, 0x72, 0x69, 0x6f, 0x75, 0x73, 0x21, 0x20,
  0x57, 0x65, 0x6c, 0x6c, 0x20, 0x64, 0x6f, 0x6e, 0x65, 0x2e, 0x0a
};
unsigned int foo_len = 47;

xxd is, somewhat oddly, part of the vim distribution, so you likely have it already. If not, that's where you get it — you can also build the tool on its own out of the vim source.

Solution 3

xxd is good but the result is highly verbose and takes a lot of storage space.

You can achieve practically the same thing using objcopy; e.g.

objcopy --input binary \
    --output elf32-i386 \
    --binary-architecture i386 foo foo.o

Then link foo.o to your program and simply use the following symbols:

00000550 D _binary_foo_end
00000550 A _binary_foo_size 
00000000 D _binary_foo_start

This is not a string literal, but it's essentially the same thing as what a string literal turns into during compilation (consider that string literals do not in fact exist at run-time; indeed, none of the other answers actually give you a string literal even at compile-time) and can be accessed in largely the same way:

unsigned char* ptr = _binary_foo_start;
int i;
for (i = 0; i < _binary_foo_size; i++, ptr++)
   putc(*ptr);

The downside is that you need to specify your target architecture to make the object file compatible, and this may not be trivial in your build system.

Solution 4

Should be exactly what you asked for:

hexdump -v -e '"\\" "x" 1/1 "%02X"' file.bin ; echo
Share:
39,669

Related videos on Youtube

Malvineous
Author by

Malvineous

Updated on September 18, 2022

Comments

  • Malvineous
    Malvineous almost 2 years

    I have a binary file I would like to include in my C source code (temporarily, for testing purposes) so I would like to obtain the file contents as a C string, something like this:

    \x01\x02\x03\x04
    

    Is this possible, perhaps by using the od or hexdump utilities? While not necessary, if the string can wrap to the next line every 16 input bytes, and include double-quotes at the start and end of each line, that would be even nicer!

    I am aware that the string will have embedded nulls (\x00) so I will need to specify the length of the string in the code, to prevent these bytes from terminating the string early.

  • PM 2Ring
    PM 2Ring over 9 years
    Nice! I didn't even know I had xxd. Now I just have to remember it exists next time I need it... or I'll probably just replicate the required functionality in Python. :)
  • Daniel Ignacio Fernández
    Daniel Ignacio Fernández over 9 years
    This produces redundant and invalid elements if input is shorter than 16 bytes.
  • PM 2Ring
    PM 2Ring over 9 years
    @CengizCan: :oops:! Is that better?
  • Lightness Races in Orbit
    Lightness Races in Orbit over 9 years
    @WanderNauta: You would access it in pretty much the same way as you'd access foo/foo_len here, and you wouldn't be vastly wasting storage space. I am convinced that the OP would be better off with objcopy and that it suits his or her requirements.
  • not2qubit
    not2qubit over 9 years
    Your answer would be more useful if you also provided the input and output examples with it.
  • PM 2Ring
    PM 2Ring over 7 years
    @Malvineous Good point! I've amended my answer. Thanks for the heads-up (and thanks for accepting my answer).
  • eigenfield
    eigenfield about 4 years
    xxd is the best!