What is the difference between C.UTF-8 and en_US.UTF-8 locales?

40,107

Solution 1

In general C is for computer, en_US is for people in US who speak English (and other people who want the same behaviour).

The for computer means that the strings are sometime more standardized (but still in English), so an output of a program could be read from an other program. With en_US, strings could be improved, alphabetic order could be improved (maybe by new rules of Chicago rules of style, etc.). So more user-friendly, but possibly less stable. Note: locales are not just for translation of strings, but also for collation (alphabetic order, numbers (e.g. thousand separator), currency (I think it is safe to predict that $ and 2 decimal digits will remain), months, day of weeks, etc.

In your case, it is just the UTF-8 version of both locales.

In general it should not matter. I usually prefer en_US.UTF-8, but usually it doesn't matter, and in your case (server app), it should only change log and error messages (if you use locale.setlocale(). You should handle client locales inside your app. Programs that read from other programs should set C before opening the pipe, so it should not really matter.

As you see, probably it doesn't matter. You may also use POSIX locale, also define in Debian. You get the list of installed locales with locale -a.

Note: Micro-optimization will prescribe C/C.UTF-8 locale: no translation of files (gettext), and simple rules on collation and number formatting, but this should visible only on server side.

Solution 2

Here are some reasons why I added LC_TIME=C.UTF-8 in /etc/default/locale, in case it helps someone:

It provides a 24-hour clock instead of AM/PM in Firefox for HTML5 input type=time (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/time) and uses a datepicker in the format DD/MM/YYYY instead of MM/DD/YYYY for HTML5 input type=date (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date).

It allows to use YYYY-MM-DD international date format (ISO 8601) with a 24-hour clock when replying to emails in Thunberbird.

Previously, it was possible with LC_TIME=en_DK.UTF-8 (http://kb.mozillazine.org/Date_display_format) but there is a bug currently and it stopped working (https://bugzilla.mozilla.org/show_bug.cgi?id=1426907#c155).

Edit: Now even the LC_TIME=C.UTF-8 workaround does not work for Thunberbird: https://bugzilla.mozilla.org/show_bug.cgi?id=1426907#c197

Solution 3

There might be some impact as they differ in sorting orders, upper-lower case relationships, collation orders, thousands separators, default currency symbol and more.

C.utf8 = POSIX standards-compliant default locale. Only strict ASCII characters are valid, extended to allow the basic use of UTF-8

en_US.utf8 = American English UTF-8 locale.

Though I'm not sure about the specific effect you might encounter, but I believe you can set the locale and encoding inside your application if needed.

Solution 4

I can confirm there is effect on different locales (C.UTF8 vs en_US.UTF8). I recently deployed one python program into a new server, and it performed differently. The old and new servers are both Ubuntu 18 servers, and the only difference is the locale (C.UTF8 vs en_US.UTF8). After setting the locale in new server as C.UTF8, they behave the same now.

It is easy to set the locale for a single application in Linux environment. You just need to add export LANG=C.UTF8; before your application. Assume you execute you application as python myprogram.py, then you type:

export LANG=C.UTF8; python myprogram.py

Share:
40,107
Marcelo
Author by

Marcelo

Updated on July 09, 2022

Comments

  • Marcelo
    Marcelo almost 2 years

    I'm migrating a python application from an ubuntu server with locale en_US.UTF-8 to a new debian server which comes with C.UTF-8 already set by default. I'm trying to understand if there would be any impact but couldn't find good resources on the internet to understand the difference between both.

  • bbarker
    bbarker over 3 years
    I want to upvote this, but I haven't yet simply because I don't know if it is true... However, I will note that it at least makes sense. It would be great if a reference could be included in this answer.
  • tripleee
    tripleee about 3 years
    What is "basic use of UTF-8"?
  • Marcelo
    Marcelo over 2 years
    Thanks Ben Lin, could you also share what were the differences you noticed?
  • Ben Lin
    Ben Lin over 2 years
    Hi Marcelo, sorry I can't pin point the differences, because my product line is too long. It is related to ocr/python/numpy/opencv and some more.