How to interpret an octal or hex dump of a binary file?

32,264

Solution 1

There are lots of ways of storing numbers - ASCII (which can have locale specific variants, such as using ',' to separate fractional part OR as a thousands grouping), binary integer (variable number of bits)/float/double (all of which may vary depending on endian architecture and whether software producing the file formalises the representation), BCD (uncompressed, packed, fixed point and other variants), Bi-quinary coded decimal ...

There is no standard.

Solution 2

One of the first things I had to memorise for computer science was Data + Interpretation = Useful Information. A corollary of this is that if you're missing Data or Interpretation, you have nothing. The data itself can't tell you how to interpret it. (you can have metadata which tells you this, but then you need to know how to interpret the metadata too)

Under the circumstances, I suggest trying this:

file filename

If it comes up with something like:

filename: data

and you have absolutely no idea what the format is, what program it's from, what its use is, or anything about the contents of filename, then you should probably give up.

Octal Dump Output

od (octal dump) produces a hybrid text-and-octal dump. Non-numbers are either printable characters such as o, s, f, etc, or non-printable characters such as \0 (ASCII 0, NUL), or \a (ASCII 7, BEL), or numbers in base 8, with the standard C prefix 0 (e.g 032 = 26 in decimal). Your file is interpreted as a stream of 8-bit bytes.

Hex Dump Output

hexdump produces a traditional hex dump, with one column listing 8-bit bytes in hexadecimal, the other showing what ASCII characters these bytes correspond ot, if any (if the byte value is a non-printable ASCII character, or not an ASCII character at all, . is shown at that position). Again, your file is interpreted as a stream of 8-bit bytes.

Integers

If your file comprises 100% binary integers (i.e. is a headerless, uniform, one-dimensional array of some sort of integer representation), then you have to answer to yourself all of these questions:

  • Are they ‘proper’ binary, or binary-coded decimal (BCD)? (probably binary)
  • How wide are they in bits?
  • If their width isn't a multiple of 8, are they bit-packed like SMS messages or Base64, or byte-aligned?
  • If their width is 8 bits or more, what is the byte order? Is it Big Endian, Little Endian, or one of the other, rarer sorts?
  • Are the integers signed, or unsigned?
  • If they're signed, are they represented in two's complement (more likely), or one's complement, or something rare and weird?

There are probably more I'm forgetting right now.

And this is just for a single dimensional uniform array of integers, coming from a common, modern architecture of computer. If your data has any sort of complexity, things are going to get so hairy it'll quickly become easier to win the lottery than to just guess the format. And you have to guess (an educated guess, but a guess), unless you know the format.

Share:
32,264

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin almost 2 years

    The binary file has strings and some numbers, If I do od -c filename or strings filename, I can see the strings properly. But, what about numbers? They are in some weird format.

    The text after doing od -c filename is like this:

    0000000 036  \0 032 004   S   D  \0  \0  \0  \0   s   e   q   1
    0000020          \0  \0  \0  \0  \0  \0  \0  \0  \t  \0   ó 002   3 001
    0000040   &  \0 032  \f   O   2 006  \0  \0  \0   o   s   f   u   s   1
    0000060           ó 002   3 001   ÿ  \r  \0  \0  \t  \0  \0   @   3   ×
    0000100 233   º 004  \0   é 003  \0  \0   &  \0 032  \f   O   2   7  \0
    0000120  \0  \0   o   s   f   e   u   1           ó 002   3 001   é 235
    0000140  \0  \0 035 003  \0   @   3   × 233   º 004  \0   Ñ  \a  \0  \0
    0000160   ä  \0 032  \f   O   r   E  \0  \0  \0   o   s   f   a   p   1
    

    How to decipher this?

    I even tried hexdump -C filename

    The output is like this:

    00000000  1e 00 1a 04 53 44 00 00  00 00 73 65 71 31 20 20  |....SD....seq1  |
    00000010  20 20 00 00 00 00 00 00  00 00 09 00 f3 02 33 01  |  ..........ó.3.|
    00000020  26 00 1a 0c 4f 32 06 00  00 00 6f 73 66 75 73 31  |&...O2....osfus1|
    00000030  20 20 f3 02 33 01 ff 0d  00 00 09 00 00 40 33 d7  |  ó.3.ÿ......@3×|
    00000040  9b ba 04 00 e9 03 00 00  26 00 1a 0c 4f 32 37 00  |.º..é...&...O27.|
    00000050  00 00 6f 73 66 65 75 31  20 20 f3 02 33 01 e9 9d  |..osfeu1  ó.3.é.|
    00000060  00 00 1d 03 00 40 33 d7  9b ba 04 00 d1 07 00 00  |.....@3×.º..Ñ...|
    00000070  e4 00 1a 0c 4f 72 45 00  00 00 6f 73 66 61 70 31  |ä...OrE...osfap1|
    

    To clarify, the main file which is a regular file had one attribute which was displaying is some weird format, so we are looking at the raw/binary file.

    Doing octal dump on the regular file, resolved the viewing problem.

    With grep 'id=123' regular_file | head -1 | od -c, I was able to see what number was in there. I was expecting 1, it showed to us as 001.