Vim shows strange characters <91>,<92>

29,422

Solution 1

The content on your source web page was overzealously reformatted. The text was undoubtedly supposed to use (straight) single quotes (ASCII 39/0x27, U+0027) instead of curly single quotes (U+2018 and U+2019, which are 0x91 and 0x92 in CP1252 (also known as MS-ANSI and WINDOWS-1252; a common 8-bit encoding on Windows)).

Vim is showing you the hex codes because they are not valid in whatever encoding Vim is using (probably UTF-8). If you are editing text that has already been saved in a file, then you can reload the file as CP1252 with :e ++enc=cp1252; this should make the curly quotes visible. But there is no real reason to reload it as CP1252, just delete the 0x91 and 0x92 characters and replace them with single quotes.

Solution 2

91 and 92 are the hex codes for open and close curly apostrophe (single quote) in the MS Windows default version of the latin1/ISO-8859-1 encoding, which is more specifically called cp1252/Windows-1252 (where cp stands for code page).

These characters are most often inserted by people copying content from Word documents / Outlook emails as part of the "Smart Quotes" feature. Other problem characters in this code page are hex 93/94 which are open and close double quotes, bullet point (•) and OE ligature (œ and Œ). You can see a full list of the "problem characters", the ones that don't map directly into ISO-8859-1 or UTF-8 with the same code, on the Wikipeda page for cp1252 highlighted in green.

If all you want is to open the file in the correct encoding then use the ++enc=cp1252 option to the :e command:

:e ++enc=cp1252 filename.txt

If you already have the file loaded, you can reload without specifying the file name:

:e ++enc=cp1252

You can replace a particular bad hex code in Vim with the substitute command (:s) and one of the code substitutions:

\d123   decimal number of character
\o40    octal number of character up to 0377
\x20    hexadecimal number of character up to 0xff
\u20AC  hex. number of multibyte character up to 0xffff
\U1234  hex. number of multibyte character up to 0xffffffff

To change the hex 91/92 characters in you need to do:

:%s/[\x91\x92]/'/g

Solution 3

Use iconv to convert the text file from CP1252 to UTF-8 before opening.

iconv -f cp1252 -t utf8 inputfile.csv > outputfile.csv

On Mac OS use this:

iconv -f cp1252 -t UTF8-MAC inputfile.csv  > outputfile.csv
Share:
29,422

Related videos on Youtube

Jeremy S.
Author by

Jeremy S.

Updated on September 17, 2022

Comments

  • Jeremy S.
    Jeremy S. over 1 year

    While using Vim over SSH I copied some content from a webpage to my SSH/Vim session and got the following result:

    SIZE=`df -h|grep $DISC|awk <91>{print $2}<92>`
    

    Apparently <91> and <92> stand for ' but how can I search and replace this stuff? And what does that 91/92 mean? How is this encoded because 91/92 in ASCII mean \ and [?

  • lambacck
    lambacck over 13 years
    You often get the curly quotes/apostrophe from content copied from MS Word which auto inserts the curly quotes/apostrophe as part of the "Smart Quotes" feature. If your font does not support those characters, you will just get an empty space instead of the character.
  • lambacck
    lambacck over 13 years
    I can't downvote due to lack of points, but this substitution command is so wrong I don't know where to begin :(
  • Confusion
    Confusion about 13 years
    This doesn't work for me: stackoverflow.com/questions/2798398/… gives a solution that does work.
  • Alex
    Alex about 13 years
    @lambacck: I was assuming that the file contains the literal strings "91" and "92", and in that case this command is correct. If these are hex characters, then you're right, you'd need your substitution command or something similar.
  • wfaulk
    wfaulk over 11 years
    +1 for :e ++enc=cp1252
  • Buttle Butkus
    Buttle Butkus about 11 years
    It would be great to have a bash command to replace those characters in all files in the directory. I came up with this from a quick google search, sed -i "s/[\x91\x92]/\'/g" *.txt but it didn't work.
  • Buttle Butkus
    Buttle Butkus about 11 years
    I just found something that seemed to work for the command line. This does find/replace for all .txt files in the current folder. Reasearch perl before using this, though, because I have no idea what the switches do. perl -p -i -e "s/[\x91\x92]/'/g" *.txt
  • Karoly Horvath
    Karoly Horvath over 9 years
    sed -i "s/\x92/'/g" worked for me.
  • Leo Simon
    Leo Simon almost 8 years
    @ChrisJohnsen, Is there any way to call vi with a flag that accomplishes the same thing as :e ++enc=cp1252? If I want to vi from the command line a file containing MS word characters, it would be nice to be able to do it in one step, rather than opening vi and then loading the file with the :e command
  • Chris Johnsen
    Chris Johnsen almost 8 years
    @LeoSimon: vim --cmd 'set fileencodings=cp1252' /path/to/file — The command runs before the normal .vimrc and sets the fileencodings option (note the ending s; you can also use the shorter name fencs) so that Vim will only try CP1252 when loading files. This should work for one-off editing of such files, but it may cause complications if you want to use that instance of Vim to edit files with other encodings.
  • Leo Simon
    Leo Simon almost 8 years
    Thanks!, to be explicit, I'm now using vim -c"set fencs" /path/to/file
  • Pryftan
    Pryftan about 4 years
    See also blog-en.openalfa.com/… on how to search for these sequences. Obviously you can do the same to replace them with something else. For example if you want to replace <91> with ' you can do (I might have this wrong - see the page for more information if I messed it up): :%s/\%x91/'/g - for global replacement. And that might be a so-called 'smart' quote that is included in the command but anyway that's the idea behind it.
  • daviewales
    daviewales about 2 years
    Ah! I was trying to match these characters in vim with \x92, but I see here that it's necessary to use [\x92].