UTF-8 characters are not displayed correctly in Debian

43,280

Solution 1

You've told bash and other applications that your terminal uses the UTF-8 encoding. That's good only if your terminal actually does use UTF-8. Bash doesn't get to decide that the terminal encoding is, the terminal gets to decide.

If you want to use UTF-8, configure your terminal to use UTF-8. Since you're using SSH, you need to configure whatever terminal you're running the SSH client in to use UTF-8. That's the default on most modern systems, but apparently yours isn't set up this way.

You should avoid setting LC_CTYPE explicitly in a terminal: ideally the terminal will set this. However this doesn't always work, especially over SSH (on many systems, the SSH server forbids the client from setting LC_CTYPE).

If you need to set the environment variable, the right place would be .profile, not .bashrc.

Solution 2

It sounds as if you are using the Linux console (rather than one of the X-based terminal emulators), and that it is not running in UTF-8 mode. I would use this script to turn it on (and investigate to see why it is off):

#!/bin/sh
# send character-string to enable UTF-8 mode
if test ".$1" = ".off" ; then
        printf '\033%%@'
else
        printf '\033%%G'
fi

that is, call the script utf8, and type

utf8 on

To investigate the error messages, I made a script like this, in two flavors (one in UTF-8, and the other in ISO-8859-1):

#!/bin/bash
printf "ä\n"
echo "ä"
ä

The UTF-8 script says

$ ./foo
ä
ä
./foo: line 4: ä: command not found

and the ISO-8859-1 script says (in a terminal using a locale with UTF-8 encoding):

$ ./foo2
�
�
./foo2: line 5: $'\344': command not found

The point is that bash adjusts its error message to correspond to the locale, and seeing that it cannot show the ISO-8859-1 character in the UTF-8 locale, shows it as an octal number.

Share:
43,280

Related videos on Youtube

Steffen
Author by

Steffen

Updated on September 18, 2022

Comments

  • Steffen
    Steffen over 1 year

    Short description of my problem:
    I ran into an issue lately where I am unable to make bash/nano/irssi/etc display "special" UTF-8 characters like the german umlauts (äüö), the euro sign (€) and some other UTF-8 characters like ß, §, etc.

    What I already tried:

    • dpkg-reconfigure locales and only generated en_US.UTF-8
    • setting LC_ALL, LANG and LANGUAGE to en_US.UTF-8 within the .bashrc for both my user and root
    • re-installed locales and libx11-data (which seems to contain all the language data)

    Of course I re-logged in via ssh after all these changes and even tried restarting the server even though I know it doesn't solve any problem in Linux in 99,9875% of all cases.

    Information on my system:
    OS: Debian stretch -> Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64 GNU/Linux
    locales: v.2.22-7

    Output of locale:

    LANG=en_US.UTF-8
    LANGUAGE=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=en_US.UTF-8
    

    When typing for example ä into the console and press enter I get -bash: $'\344': command not found.
    Honestly I am out of ideas, can anyone help me out with this?

    • Marius
      Marius almost 8 years
      stretch is Debian/testing, which has bash 4.3-14+b1, and that does not open any interesting files as seen with strace.
    • Steffen
      Steffen almost 8 years
      So this is possibly a bug of bash itself then? I shamefully have to admit, that I didn't had the idea to check it with strace. EDIT: I tested it on another machine with stretch which seems to have the very same problem (bash 4.3-14+b1).
    • Marius
      Marius almost 8 years
      It behaves as you show in an older version of bash (I've Debian 7 running), and was probably introduced as a feature enhancement rather than bug-fix. I used strace to check if bash is reading some relevant locale files, but found no sign of that.
    • Steffen
      Steffen almost 8 years
      I did just realize, that it can't be a bug of bash itself, since it acts the very same way in every other application I tested (nano, irssi, dpkg-reconfigure [the UTF-8 blocks are just some garbage characters here]), so it needs to be some systemwide "thing" (bug/setting/whatever).
    • Marius
      Marius almost 8 years
      Well... the $'\344' hints that it may not be UTF-8. In Debian 7, the message shows $'\303\244'. If I change the input character to Latin-1 ä, I get the same message that you are seeing. Perhaps whatever "console" you are using is set to non-UTF-8 mode, but the locale still uses UTF-8.
    • thenakulchawla
      thenakulchawla about 7 years
      I am struggling with almost the same issue, and none of the answers below seem to be working for me. What solution did you use?
    • Ken Sharp
      Ken Sharp over 6 years
      Did you ever solve this?
  • Steffen
    Steffen almost 8 years
    Hello, first I'd like to thank you very much for your investigations. I followed your steps exactly and all it responds is $'\344': command not found. I even created the script on another machine and transfered it afterwards to make sure its encoding is set properly. Of course I executed utf8 on first, but actually this does not do anything (at least it seems like it), other than printing a capital G. I've tried it with either SecureCRT and Putty as client and made sure, both use UTF-8 as encoding and "Xterm" as emulation. Additionally I checked the font, if it has those UTF-8 chars.
  • Steffen
    Steffen almost 8 years
    I've tried SecureCRT and Putty as SSH client and ensured that both use UTF-8 as encoding and Xterm as emulation - the font has the necessary characters as well. Actually (as I mentioned in a comment above) I'm able to reproduce the very same behaviour on a machine running Debian Stretch, but not on a machine which is running Debian Wheezy or Debian Jessie, while using the exact same session options. So for me it seems something on the system side changed with an upgrade to Stretch - or am I interpreting that wrong?
  • Marius
    Marius almost 8 years
    yes... it's not due to a difference in bash but rather in how you are entering the characters.
  • Steffen
    Steffen almost 8 years
    I've entered the characters the exact same way.