Replace accented or special characters using sed or tr unix command using Unicode Code or Hex
Your script
…works fine for me. Every substitution is performed as expected, except for one:
-e "s/\'//g" \
should be
-e "s/'//g" \
(There's no need to escape the single quote, your expression is between double quotes.)
Applied to a file containing
"'$%&@^`|~¡¨´¢£§¬°·¹²³¿ªàáâãäåæ
it ouputs:
S E a i c o.123 aaaaaaaae
(Without spaces. I added them to make it easier to compare orginialm pattern and substitution.)
Hexa code
For replacing with hexadecimal code, use following syntax:
echo ¢ | sed 's/\xC2\xA2/cent/g'
Why is so? An hexadecimal value XX
is given to sed with \xXX
syntax (see info sed
). And for your ¢
character, the third column of table on webpage you link gives 0xc2 0xa2
.
Encoding
As you are trying to replace UTF-8 encoded characters, I assume your file uses UTF-8 encoding. If it is not, a quick solution would be to convert it (or a copy of it) into UTF-8 (e.g. with your favorite text editor).
Related videos on Youtube
user2727262
Updated on June 04, 2022Comments
-
user2727262 almost 2 years
I wonder if I can use character set found in http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=0x&unicodeinhtml=hex to replace accented or special characters using sed or tr.
I have a script that uses sed command. Sometimes it does not work :(
it goes like this:
sed -e "s/\"//g" \ -e "s/\'//g" \ -e "s/[$]/S/g" \ -e "s/%//g" \ -e "s/&/E/g" \ -e "s/@/a/g" \ -e "s/\^//g" \ -e "s/\`//g" \ -e "s/|//g" \ -e "s/~//g" \ -e "s/¡/i/g" \ -e "s/¨//g" \ -e "s/\´//g" \ -e "s/¢/c/g" \ -e "s/£//g" \ -e "s/§//g" \ -e "s/¬//g" \ -e "s/°/o/g" \ -e "s/·/./g" \ -e "s/¹/1/g" \ -e "s/²/2/g" \ -e "s/³/3/g" \ -e "s/¿//g" \ -e "s/ª/a/g" \ -e "s/à/a/g" \ -e "s/á/a/g" \ -e "s/â/a/g" \ -e "s/ã/a/g" \ -e "s/ä/a/g" \ -e "s/å/a/g" \ -e "s/æ/ae/g" \
Os, I am thinking if I use hex or octal unicode codes to be used in sed, it would work. But I do not know how...
e.g. echo ¢ | sed 's/\x{00A2}/cent/g'
I appreciate your help.
-
Qeole almost 10 yearsSometimes it does not work -> If it is related to some cases, could you detail which ones make it fail?
-
Qeole almost 10 yearsFor second question: try
echo ¢ | sed 's/\xC2\xA2/cent/g'
, works for me. -
user2727262 almost 10 yearsThanks @Qeole but it did not work for me. I am using aix btw. My sed command will not work if the file I am trying to replace was not created in a UTF-8 w/o BOM format file. At least that was what I have observed.
-
Qeole almost 10 yearsThat's something you should definitely have precised. Can't you just reencode your file (or a copy of it) into UTF-8 first?
-
-
mcepl about 4 yearsAnd if you don’t know how to get those hexadecimal values for Unicode characters, then GNU
echo
can help:echo -ne '\u00A0'|xxd
.