How to remove non-ascii chars using sed
Solution 1
This doesn't seem to work with sed
. Perhaps tr
will do?
tr -d '\200-\377'
Or with the complement:
tr -cd '\000-\177'
Solution 2
Did you try
cat /bin/mkdir | tr -cd "[:print:]"
I think it solves the problem ?
If only text content interest you, you can also use
cat /bin/mkdir | strings
Solution 3
Do you know what encoding the file is currently using? If so, you can use iconv to convert it. It's a utility to convert from one character encoding to another. So if the original file is in UTF-8 and you want to convert to ASCII you can use the following:
iconv -f utf8 -t ascii <inputfile>
The file command on the input file might tell you the current encoding.
Interestingly, there's a command called enca which will do its best to determine the character encoding being used if you know the language of the contents of the file.
This other question might be the answer.
Solution 4
The solutions offered here did not work for me. Maybe my problem was different, but I needed to strip the ASCII colors and other characters from the otherwise pure ASCII text.
The following worked for me, however:
Stripping Escape Codes from ASCII Text
sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g'
In context (BASH):
$ printf "\e[32;1mhello\e[0m\n"
hello
$ printf "\e[32;1mhello\e[0m\n" | cat -vet
^[[32;1mhello^[[0m$
$ printf "\e[32;1mhello\e[0m\n" | sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g' | cat -vet
hello$
user87005
Updated on May 22, 2020Comments
-
user87005 almost 4 years
I want to remove non-ascii chars from some file. I have already tried these many regexs.
sed -e 's/[\d00-\d128]//g' # not working cat /bin/mkdir | sed -e 's/[\x00-\x7F]//g' >/tmp/aa
but this file contains some non-ascii chars.
[root@asssdsada ~]$ hexdump /tmp/aa |more 00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 0123456789ABCDEF 00000000 45 4C 46 B0 F0 73 38 C0 - C0 BC BC FF FF 61 61 61 ELF..s8......aaa 00000010 A0 A0 50 E5 74 64 50 57 - 50 57 50 57 D4 D4 51 E5 ..P.tdPWPWPW..Q. 00000020 74 64 6C 69 62 36 34 6C - 64 6C 69 6E 75 78 78 38 tdlib64ldlinuxx8 00000030 36 36 34 73 6F 32 47 4E - 55 42 C8 C0 80 70 69 42 664so2GNUB...piB 00000040 44 47 BA E3 92 43 45 D5 - EC 46 E4 DE D8 71 58 B9 DG...CE..F...qX. 00000050 8D F1 EA D3 EF 4B 86 FC - A9 DA 79 ED 63 B5 51 92 .....K....y.c.Q. 00000060 BA 6C FC D1 69 78 30 ED - 74 F1 73 95 CC 85 D2 46 .l..ix0.t.s....F 00000070 A5 B4 6C 67 DA 4A E9 9A - 4B 58 77 A4 37 80 C0 4F ..lg.J..KXw.7..O 00000080 F3 E9 B2 77 65 97 74 F9 - A2 C0 F2 CC 4A 9C 58 A1 ...we.t.....J.X.
-
user87005 about 11 yearsI am working on linux system, which is having very limited pool of commands and 'iconv' is not available.
-
chooban about 11 yearsLooking at the output from hexdump, is this a binary file? (Guessing from ELF at the start) If so, what's the purpose of removing non-ascii characters? The binary will be corrupted.
-
user87005 about 11 yearsit is just an example, friend.
-
chooban about 11 yearsAh, cool. I've added a link to a related question which might solve your problems.
-
user87005 about 11 yearssame is working fine with this perl command, But I need sed. cat /bin/mkdir | perl -ne 's/[^[:ascii:]]//g;print $_;'
-
user87005 about 11 yearsenca also not available on my machine. % enca enca: Command not found.
-
chooban about 11 yearsRather than cat'ing the file, have you tried just passing it to sed as an argument?
-
Thor about 7 years@EladTabak: It should work. Can you produce an example where is does not work?
-
codeforester over 5 yearsOn macOS High Sierra, I get this error:
tr: Illegal byte sequence
. -
Thor over 5 years@codeforester: I tested this with tr from GNU coreutils
-
ericcurtin over 4 yearstr -cd '\001-\177' removes the NULL character do, worth removing, many tools such as grep recognize input as binary if it contains nulls. Binary file (standard input) matches.
-
Uncle Iroh almost 3 yearsFor me at least -- running this on my file removed all lowercase ascii characters