Vim shows strange characters <91>,<92>
Solution 1
The content on your source web page was overzealously reformatted. The text was undoubtedly supposed to use (straight) single quotes (ASCII 39/0x27
, U+0027
) instead of curly single quotes (U+2018
and U+2019
, which are 0x91 and 0x92
in CP1252 (also known as MS-ANSI and WINDOWS-1252; a common 8-bit encoding on Windows)).
Vim is showing you the hex codes because they are not valid in whatever encoding Vim is using (probably UTF-8). If you are editing text that has already been saved in a file, then you can reload the file as CP1252 with :e ++enc=cp1252
; this should make the curly quotes visible. But there is no real reason to reload it as CP1252, just delete the 0x91
and 0x92
characters and replace them with single quotes.
Solution 2
91 and 92 are the hex codes for open and close curly apostrophe (single quote) in the MS Windows default version of the latin1/ISO-8859-1 encoding, which is more specifically called cp1252/Windows-1252 (where cp stands for code page).
These characters are most often inserted by people copying content from Word documents / Outlook emails as part of the "Smart Quotes" feature. Other problem characters in this code page are hex 93/94 which are open and close double quotes, bullet point (•) and OE ligature (œ and Œ). You can see a full list of the "problem characters", the ones that don't map directly into ISO-8859-1 or UTF-8 with the same code, on the Wikipeda page for cp1252 highlighted in green.
If all you want is to open the file in the correct encoding then use the ++enc=cp1252 option to the :e command:
:e ++enc=cp1252 filename.txt
If you already have the file loaded, you can reload without specifying the file name:
:e ++enc=cp1252
You can replace a particular bad hex code in Vim with the substitute command (:s) and one of the code substitutions:
\d123 decimal number of character
\o40 octal number of character up to 0377
\x20 hexadecimal number of character up to 0xff
\u20AC hex. number of multibyte character up to 0xffff
\U1234 hex. number of multibyte character up to 0xffffffff
To change the hex 91/92 characters in you need to do:
:%s/[\x91\x92]/'/g
Solution 3
Use iconv
to convert the text file from CP1252 to UTF-8 before opening.
iconv -f cp1252 -t utf8 inputfile.csv > outputfile.csv
On Mac OS use this:
iconv -f cp1252 -t UTF8-MAC inputfile.csv > outputfile.csv
Related videos on Youtube
Jeremy S.
Updated on September 17, 2022Comments
-
Jeremy S. over 1 year
While using Vim over SSH I copied some content from a webpage to my SSH/Vim session and got the following result:
SIZE=`df -h|grep $DISC|awk <91>{print $2}<92>`
Apparently
<91>
and<92>
stand for'
but how can I search and replace this stuff? And what does that91
/92
mean? How is this encoded because91
/92
in ASCII mean\
and[
? -
lambacck over 13 yearsYou often get the curly quotes/apostrophe from content copied from MS Word which auto inserts the curly quotes/apostrophe as part of the "Smart Quotes" feature. If your font does not support those characters, you will just get an empty space instead of the character.
-
lambacck over 13 yearsI can't downvote due to lack of points, but this substitution command is so wrong I don't know where to begin :(
-
Confusion about 13 yearsThis doesn't work for me: stackoverflow.com/questions/2798398/… gives a solution that does work.
-
Alex about 13 years@lambacck: I was assuming that the file contains the literal strings "91" and "92", and in that case this command is correct. If these are hex characters, then you're right, you'd need your substitution command or something similar.
-
wfaulk over 11 years+1 for
:e ++enc=cp1252
-
Buttle Butkus about 11 yearsIt would be great to have a bash command to replace those characters in all files in the directory. I came up with this from a quick google search,
sed -i "s/[\x91\x92]/\'/g" *.txt
but it didn't work. -
Buttle Butkus about 11 yearsI just found something that seemed to work for the command line. This does find/replace for all .txt files in the current folder. Reasearch perl before using this, though, because I have no idea what the switches do.
perl -p -i -e "s/[\x91\x92]/'/g" *.txt
-
Karoly Horvath over 9 years
sed -i "s/\x92/'/g"
worked for me. -
Leo Simon almost 8 years@ChrisJohnsen, Is there any way to call vi with a flag that accomplishes the same thing as
:e ++enc=cp1252
? If I want to vi from the command line a file containing MS word characters, it would be nice to be able to do it in one step, rather than opening vi and then loading the file with the:e
command -
Chris Johnsen almost 8 years@LeoSimon:
vim --cmd 'set fileencodings=cp1252' /path/to/file
— The command runs before the normal.vimrc
and sets thefileencodings
option (note the endings
; you can also use the shorter namefencs
) so that Vim will only try CP1252 when loading files. This should work for one-off editing of such files, but it may cause complications if you want to use that instance of Vim to edit files with other encodings. -
Leo Simon almost 8 yearsThanks!, to be explicit, I'm now using
vim -c"set fencs" /path/to/file
-
Pryftan about 4 yearsSee also blog-en.openalfa.com/… on how to search for these sequences. Obviously you can do the same to replace them with something else. For example if you want to replace <91> with
'
you can do (I might have this wrong - see the page for more information if I messed it up)::%s/\%x91/'/g
- for global replacement. And that might be a so-called 'smart' quote that is included in the command but anyway that's the idea behind it. -
daviewales about 2 yearsAh! I was trying to match these characters in vim with
\x92
, but I see here that it's necessary to use[\x92]
.