How to extract text portion of a binary file in linux/bash?

15,366

Solution 1

Use the strings utility - that's exactly what it's designed for.

Solution 2

Here's what I used in a system that didn't have the "strings" utility installed

cat yourfilename | tr -cd "[:print:]"

This prints the text and removes unprintable characters in one fell swoop, unlike "cat -v filename" which prints only text but requires some postprocessing to remove unwanted stuff. Note that some of the binary data may be printable so you'll still get some gibberish between the good stuff. I think strings removes this gibberish too if you can use that.

Solution 3

If you're on a Debian distro, you can probably get radare2 (r2) with just sudo apt install radare2.

After you've installed r2, either with apt, some other installer on some other distro, or by following an online guide, you can use rabin2 to extract just the text part of a binary:

$ rabin2 -z your_binary

This is often "better" than just strings because it outputs just the useful .data section of the binary. Stuff outside that section isn't always very useful.

Share:
15,366
RonPringadi
Author by

RonPringadi

Updated on June 29, 2022

Comments

  • RonPringadi
    RonPringadi almost 2 years

    I have a binary file. If I open it with vi, it shows sequences of human-readable text and binary characters. What is the best way to extract the human-readable portion only using bash?

    I was thinking, maybe we can do this over a grep or sed pattern?

    $ cat file1.bin | grep '????'  > newfile.txt
    
  • RonPringadi
    RonPringadi almost 8 years
    I tried that before it didn't work. But then I realized i missed the s. strings not string , my bad :-) Thank you!
  • RonPringadi
    RonPringadi about 5 years
    strings ~/Pictures/Pic_A.jpg has result (or better). $ cat ~/Pictures/Pic_A.jpg | tr -cd "[:print:]" Result: tr: Illegal byte sequence
  • Cliff
    Cliff over 3 years
    This solution only works for executable files, as the tool is reverse engineering focused. Not every binary file is executable (thus, not having a .data section).
  • ChocolateOverflow
    ChocolateOverflow over 3 years
    Interesting strings like passwords and paths hard-coded into binaries, as far as I know, are usually in the .data section so using rabin2 -z goes straight to those without printing the gibberish we get when using strings. I do use both though.
  • Cliff
    Cliff over 3 years
    My comment was made to make it clear to readers the use case in which your tool works. Your use case is specific to executables, as you keep mentioning the .data section. The use case that landed me on this question was a nonexecutable binary file having no .data section for rabin2 to operate on. strings is useful on more than just executables. :-)
  • Zimba
    Zimba over 3 years
    so how to use strings?
  • DevSolar
    DevSolar almost 2 years
    @Zimba man strings