What is the default character encoding?
Solution 1
- Encoding of filenames on the filesystem is utf-8.
- Bash thinks in bytes, not with strings-with-encoding-knowledge. So no default encoding. gnome-terminal's default encoding is utf-8
- Python's default encoding is ascii
Solution 2
The default character encoding is UTF-8 (Unicode), though almost all (quite possibly all on a default install) file names are regular ASCII characters, common to most encodings.
I don't know what you mean by "how many strings are represented by a bash or python script". You can use Unicode characters in bash scripts on Ubuntu, but usually with a bash script, you call other programs, and whether those other programs will handle them is another matter. It's certainly possible to do so with Python too, though you'll want to familiarize yourself with the packages and settings related thereto.
Related videos on Youtube
gabkdlly
Updated on September 17, 2022Comments
-
gabkdlly over 1 year
I don't myself know how deep this question actually goes (for example, for all I know there could be several, depending on my task).
Particularly, I am interested in what kinds of strings are used to name files and folders on the system.
I am also interested in how strings are represented by default for a bash or python script.
-
Admin over 13 yearsThat's a good question especially if you converted from Windows and contribute source code to some version control system. After switching to Ubuntu you may suddenly experience unreadable special characters, because Windows typically doesn't use UTF-8.
-
-
Broam over 13 yearsPython 3 (I think?) is changing to unicode strings by default.
-
Dennis Kaarsemaker over 13 yearsPython 3's str() type is a unicode object in UCS-2 or UCS-4 encoding internally. How data is read or written from e.g. files and stdin is to be determined by the application/library developer, with utf-8 being standard (e.g. print(some_str) will print a utf-8 representation).
-
Ralf over 13 yearsPython 3 will go unicode like Ruby 1.9 Python 2 and less, like Ruby 1.8 and less are ascii-based and work with all charsets, but their idea about character count for unicode strings is wrong. (which usually isn't a problem)
-
frabjous over 13 yearsgnome-terminal doesn't default to utf-8; it just uses whatever your locale is set to. (As I discovered the hard way recently.)
-
Robert Siemer over 9 years@DennisKaarsemaker No, Python3 does not leave encoding to the developer only and uses a default of UTF-8. Stdin and -out for example use the encoding of the environment by default!