What is the default character encoding?

17,525

Solution 1

  • Encoding of filenames on the filesystem is utf-8.
  • Bash thinks in bytes, not with strings-with-encoding-knowledge. So no default encoding. gnome-terminal's default encoding is utf-8
  • Python's default encoding is ascii

Solution 2

The default character encoding is UTF-8 (Unicode), though almost all (quite possibly all on a default install) file names are regular ASCII characters, common to most encodings.

I don't know what you mean by "how many strings are represented by a bash or python script". You can use Unicode characters in bash scripts on Ubuntu, but usually with a bash script, you call other programs, and whether those other programs will handle them is another matter. It's certainly possible to do so with Python too, though you'll want to familiarize yourself with the packages and settings related thereto.

Share:
17,525

Related videos on Youtube

gabkdlly
Author by

gabkdlly

Updated on September 17, 2022

Comments

  • gabkdlly
    gabkdlly over 1 year

    I don't myself know how deep this question actually goes (for example, for all I know there could be several, depending on my task).

    Particularly, I am interested in what kinds of strings are used to name files and folders on the system.

    I am also interested in how strings are represented by default for a bash or python script.

    • Admin
      Admin over 13 years
      That's a good question especially if you converted from Windows and contribute source code to some version control system. After switching to Ubuntu you may suddenly experience unreadable special characters, because Windows typically doesn't use UTF-8.
  • Broam
    Broam over 13 years
    Python 3 (I think?) is changing to unicode strings by default.
  • Dennis Kaarsemaker
    Dennis Kaarsemaker over 13 years
    Python 3's str() type is a unicode object in UCS-2 or UCS-4 encoding internally. How data is read or written from e.g. files and stdin is to be determined by the application/library developer, with utf-8 being standard (e.g. print(some_str) will print a utf-8 representation).
  • Ralf
    Ralf over 13 years
    Python 3 will go unicode like Ruby 1.9 Python 2 and less, like Ruby 1.8 and less are ascii-based and work with all charsets, but their idea about character count for unicode strings is wrong. (which usually isn't a problem)
  • frabjous
    frabjous over 13 years
    gnome-terminal doesn't default to utf-8; it just uses whatever your locale is set to. (As I discovered the hard way recently.)
  • Robert Siemer
    Robert Siemer over 9 years
    @DennisKaarsemaker No, Python3 does not leave encoding to the developer only and uses a default of UTF-8. Stdin and -out for example use the encoding of the environment by default!