"grep" offset of ascii string from binary file

30,400

Solution 1

You could use strings for this:

strings -a -t x filename | grep foobar

Tested with GNU binutils.

For example, where in /bin/ls does --help occur:

strings -a -t x /bin/ls | grep -- --help

Output:

14938 Try `%s --help' for more information.
162f0       --help     display this help and exit

Solution 2

grep --byte-offset --only-matching --text foobar filename

The --byte-offset option prints the offset of each matching line.

The --only-matching option makes it print offset for each matching instance instead of each matching line.

The --text option makes grep treat the binary file as a text file.

You can shorten it to:

grep -oba foobar filename

It works in the GNU version of grep, which comes with linux by default. It won't work in BSD grep (which comes with Mac by default).

Solution 3

I wanted to do the same task. Though strings | grep worked, I found gsar was the very tool I needed.

http://tjaberg.com/

The output looks like:

>gsar.exe -bic -sfoobar filename.bin
filename.bin: 0x34b5: AAA foobar BBB
filename.bin: 0x56a0: foobar DDD
filename.bin: 2 matches found
Share:
30,400
mgilson
Author by

mgilson

I used to be a fortran and sometimes C programmer, but these days I write mostly python and javascript. I am interested in computational physics and like to write code. I also used to be an avid gnuplot user and maybe someday I will be again... I am a currently a software engineer at Argo AI working to make the world's cars drive themselves. ~Matt

Updated on December 21, 2020

Comments

  • mgilson
    mgilson over 3 years

    I'm generating binary data files that are simply a series of records concatenated together. Each record consists of a (binary) header followed by binary data. Within the binary header is an ascii string 80 characters long. Somewhere along the way, my process of writing the files got a little messed up and I'm trying to debug this problem by inspecting how long each record actually is.

    This seems extremely related, but I don't understand perl, so I haven't been able to get the accepted answer there to work. The other answer points to bgrep which I've compiled, but it wants me to feed it a hex string and I'd rather just have a tool where I can give it the ascii string and it will find it in the binary data, print the string and the byte offset where it was found.

    In other words, I'm looking for some tool which acts like this:

    tool foobar filename
    

    or

    tool foobar < filename
    

    and its output is something like this:

    foobar:10
    foobar:410
    foobar:810
    foobar:1210
    ...
    

    e.g. the string which matched and a byte offset in the file where the match started. In this example case, I can infer that each record is 400 bytes long.

    Other constraints:

    • ability to search by regex is cool, but I don't need it for this problem
    • My binary files are big (3.5Gb), so I'd like to avoid reading the whole file into memory if possible.
  • mgilson
    mgilson over 11 years
    I tried this, all it says is: Binary file filename matches. My system is Ubuntu Linux, and grep --version gives: "GNU grep 2.5.2"
  • Hari Menon
    Hari Menon over 11 years
    Try adding the -a option to treat binary files as text
  • mgilson
    mgilson over 11 years
    I ended up using strings -a -t d filename | grep foobar to write the output in decimal instead of hex. Otherwise, great answer that seems like it will work with different flavors of grep.
  • Ivan X
    Ivan X over 8 years
    It could work in OS X grep if you prefix the grep with LC_CTYPE=C ; however, recent (and maybe not so recent) OS X has grep 2.5.1, and that has a a bug in it which always outputs 0 as the byte offset.
  • Hitechcomputergeek
    Hitechcomputergeek almost 8 years
    I'd suggest using grep -F if you just need to find a known string, as it has a lot less overhead.
  • Luc
    Luc over 5 years
    grep -oba (see Hari Menon's answer) is much faster, but using strings allows you to do partial matching. Which answer is better depends on your use-case!