How can I set VIM's default encoding to UTF-8?

79,775

Solution 1

When Vim reads an existing file, it tries to detect the file encoding. When writing out the file, Vim uses the file encoding that it detected (except when you tell it differently). So a file detected as UTF-8 is written as UTF-8, a file detected as Latin-1 is written as Latin-1, and so on.

By default, the detection process is crude. Every file that you open with Vim will be assumed to be Latin-1, unless it detects a Unicode byte-order mark at the top. A UTF-8 file without a byte-order mark will be hard to edit because any multibyte characters will be shown in the buffer as character sequences instead of single characters.

Worse, Vim, by default, uses Latin-1 to represent the text in the buffer. So a UTF-8 file with a byte-order mark will be corrupted by down-conversion to Latin-1.

The solution is to configure Vim to use UTF-8 internally. This is, in fact, recommended in the Vim documentation, and the only reason it is not configured that way out of the box is to avoid creating enormous confusion among users who expect Vim to operate basically as a Latin-1 editor.

In your .vimrc, add set encoding=utf-8 and restart Vim.

Or instead, set the LANG environment variable to indicate that UTF-8 is your preferred character encoding. This will affect not just Vim but any software which relies on LANG to determine how it should represent text. For example, to indicate that text should appear in English (en), as spoken in the United States (US), encoded as UTF-8 (utf-8), set LANG=en_US.utf-8.

Now Vim will use UTF-8 to represent the text in the buffer. Plus, it will also make a more determined effort to detect the UTF-8 encoding in a file. Besides looking for a byte-order mark, it will also check for UTF-8 without a byte-order mark before falling back to Latin-1. So it will no longer corrupt a file coded in UTF-8, and it should properly display the UTF-8 characters during the editing session.

For more information on how Vim detects the file encoding, see the fileencodings option in the Vim documentation.

For more information on setting the encoding that Vim uses internally, see the encoding option.

If you need to override the encoding used when writing a file back to disk, see the fileencoding option.

Solution 2

According to vimdoc vim tries to detect automatically the file encoding, so if you're editing existing files you shold be good.

You can always force the encoding if you want with :set fileencodings=utf-8. You can find the documentation here.

Share:
79,775

Related videos on Youtube

Paolo
Author by

Paolo

Please forgive my ignorance. Self reminders I'm here to learn. Learning is an experience, everything else is just information. (A.Einstein)

Updated on September 18, 2022

Comments

  • Paolo
    Paolo over 1 year

    I'd like to contribute to an open source project providing translated strings. One of their requirements is that contributors must use UTF-8 as the encoding for the PO files.

    I'm using VIM 7.3 on Linux. How can I be sure that VIM's encoding is set to UTF-8, so that I can edit and save the .po file the right way?

  • MetaEd
    MetaEd over 12 years
    fileencodings=utf-8 will cause Vim to recognize the input file as UTF-8 but then perform a lossy conversion to Latin-1. Plus it will cause Vim to fail to recognize UTF-16. The better solution is to set encoding=utf-8 which turns Vim from a native one-byte editor into a native multibyte editor.
  • MetaEd
    MetaEd over 6 years
    @DaveKennedy Vim is able to treat the file as Latin-1 only when the file is unambiguously Latin-1. When the encoding is ambiguous, Vim has to choose. For example, a file containing only 7-bit ASCII codes is valid Latin1 but it's also valid UTF-8, and others. Such a file will normally be treated as UTF-8. One way to avoid this outcome is to make the file encoding unambiguous. The trick I've seen is to add a string of 0xF7 codes. In UTF-8, 0xF7 is invalid. But in Latin-1, it represents the division sign (÷). Vim will normally conclude that the file is Latin-1.
  • Brōtsyorfuzthrāx
    Brōtsyorfuzthrāx about 3 years
    It's been a lot of years; so, I'm curious--has the answer changed at all since?