Trying to remove non-printable characters (junk values) from a UNIX file
23,333
Perhaps you could go with the complement of [:print:]
, which contains all printable characters:
tr -cd '[:print:]' < file > newfile
If your version of tr
doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):
sed 's/[^[:print:]]//g' file
Author by
Pranav
Updated on July 28, 2022Comments
-
Pranav almost 2 years
I am trying to remove non-printable character (for e.g.
^@
) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time. I tried usingsed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME
but still the
^@
characters are not removed. Also I tried usingawk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE
but it also did not help.
Can anybody suggest some alternative way to remove non-printable characters?
Used
tr -cd
but it is removing accented characters. But they are required in the file. -
linuxfan says Reinstate Monica over 4 yearsThis reply is very short and lacks a minimum of explanation, so it is candidate for deletion. Please try to add some more explanation about the command you suggest.