Trying to remove non-printable characters (junk values) from a UNIX file

23,333

Perhaps you could go with the complement of [:print:], which contains all printable characters:

tr -cd '[:print:]' < file > newfile

If your version of tr doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):

sed 's/[^[:print:]]//g' file
Share:
23,333
Pranav
Author by

Pranav

Updated on July 28, 2022

Comments

  • Pranav
    Pranav almost 2 years

    I am trying to remove non-printable character (for e.g. ^@) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time. I tried using

    sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME
    

    but still the ^@ characters are not removed. Also I tried using

    awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE 
    

    but it also did not help.

    Can anybody suggest some alternative way to remove non-printable characters?

    Used tr -cd but it is removing accented characters. But they are required in the file.

  • linuxfan says Reinstate Monica
    linuxfan says Reinstate Monica over 4 years
    This reply is very short and lacks a minimum of explanation, so it is candidate for deletion. Please try to add some more explanation about the command you suggest.