Should I use encoding declaration in Python 3?

python python-3.x encoding utf-8

48,110

Solution 1

Because the default is UTF-8, you only need to use that declaration when you deviate from the default, or if you rely on other tools (like your IDE or text editor) to make use of that information.

In other words, as far as Python is concerned, only when you want to use an encoding that differs do you have to use that declaration.

Other tools, such as your editor, can support similar syntax, which is why the PEP 263 specification allows for considerable flexibility in the syntax (it must be a comment, the text coding must be there, followed by either a : or = character and optional whitespace, followed by a recognised codec).

Note that it only applies to how Python reads the source code. It doesn't apply to executing that code, so not to how printing, opening files, or any other I/O operations translate between bytes and Unicode. For more details on Python, Unicode, and encodings, I strongly urge you to read the Python Unicode HOWTO, or the very thorough Pragmatic Unicode talk by Ned Batchelder.

Solution 2

No, if:

entire project use only the UTF-8, which is a default.
and you're sure your IDE tool doesn't need that encoding declaration in each file.

Yes, if

your project relies on different encoding
or relies on many encodings.

For multi-encodings projects:

If some files are encoded in the non-utf-8, then even for these encoded in UTF-8 you should add encoding declaration too, because the golden rule is Explicit is better than implicit.

Reference:

PyCharm doesn't need that declaration:

configuring encoding for specific file in pycharm

vim doesn't need that declaration, but:

# vim: set fileencoding=<encoding name> :

48,110

Author by

Mateusz Jagiełło

Updated on March 08, 2020

Comments

Mateusz Jagiełło about 4 years

Python 3 uses UTF-8 encoding for source-code files by default. Should I still use the encoding declaration at the beginning of every source file? Like # -*- coding: utf-8 -*-
pepr over 11 years

The # -*- coding: utf-8 -*- may still be useful for some editors to switch to the expected encoding when editing the source file.
endolith almost 7 years

@pepr A Byte Order Mark could do the same, no?
Martijn Pieters almost 7 years

@endolith: the UTF-8 BOM is an abomination on this earth brought forth by Microsoft.. See en.wikipedia.org/wiki/Byte_order_mark#UTF-8
endolith almost 7 years

@MartijnPieters Your link doesn't seem to agree with you
Martijn Pieters almost 7 years

@endolith: no, the WP article only summarises the background, it is my own opinion that it is an abomination. The point of a BOM is to record the byte order (hence the name, Byte Order Mark). There is no byte order confusion in UTF-8, it only has that function in UTF-16 and UTF-32. The value is already a re-purposed zero-width no-break space character (handy, as accidental printing then ends up with entirely invisible output), re-using that to be a magic constant is wrong, in my view.
pepr almost 7 years

@endolith: I agree with UTF-8 BOM being a Microsoft wart. As also the above mentioned wiki page says, it has no meaning. BOM stands for Byte Order Mark. In UTF-8, there is no doubt about the byte order. And the UTF-8 BOM causes problems sometimes (try to concatenate the text files, for example). It can be read also as this is NOT UTF-16. Anyway, it is completely unrelated to the # -*- coding.... The editor may know the coding prescription, and it can completely ignore the BOM.
mrgloom over 6 years

Can you eleborate what non-standard cases when you need to use # -*- coding: utf-8 -*- in python 3?
Martijn Pieters over 6 years

@mrgloom for Python, there are no non-standard cases. But if your editor is not using UTF-8 by default but it does support modelines (such as Vim or Emacs or various other code editors), then you can write your comment such that both Python and your editor can both read it, so both use the same encoding when working with your source file.
Martijn Pieters over 6 years

@mrgloom the specific example you used, with the -*- markers, is an emacs modeline. Emacs, reading that line, will set the file encoding to UTF-8, which is very helpful when editing. Vim uses a different syntax, but Python uses pattern matching to support either format as well as others.