Linux not interpreting UTF8 encoded characters

27,192

Solution 1

The problem here doesn't seem to be on your browser nor your Apache configuration. You need to double check the locale settings of your system.

You need to check if the locale apache is running is UTF-8 enabled. To do so you may run the command:

$ sudo su -l -c locale www-data

where www-data is the apache user. Check if the locale returned doesn't looks like, for example, es_ES.UTF-8 it means your locale doesn't have UTF-8 enabled.

If this is the case, you may change this configuration, on a CentOS machine, at /etc/sysconfig/i18n, changing the line LANG="es_ES" to LANG="es_ES.UTF-8". But, still, in order for this to work, your system need the locale file for this language. To check if it existes, use locale -a to get a list of locales available.

If your system doesn't have a UTF-8 enabled locale, you may generate one using the command:

$ sudo localedef -i es_ES -f UTF-8 es_ES.utf8 

and set it as your default language.

Hope this help!

Solution 2

In addition to fboaventura's answer

Check if the locale apache is running

$ sudo su -l -c locale www-data

In order to change i18n configuration at /etc/sysconfig/i18n :

Go to the CentOS system configuration directory

$ cd /etc/sysconfig

Make backup copy of your language setting file

$ cp i18n i18n.backup

Edit language setting file by using nano

$ nano i18n

Edit the file to include your configuration

For example:

LANG="en_US.utf8"
SYSFONT="latarcyrheb-sun16"
SUPPORTED="en_US.utf8:en_US:en:fr_FR.utf8:fr_FR:fr :es_ES.utf8:es_ES:es:de_DE.utf8:de_DE:de:sv_SE.utf 8:sv_SE:sv:zh_CN.utf8:
zh_CN:zh:zh_TW.utf8:zh_TW:zh:ja_JP.utf8:ja_JP:ja:k o_KR.utf8:ko_KR:ko"

Save the file and restart the system.

Additional Resources

Share:
27,192
w0rldart
Author by

w0rldart

Updated on September 18, 2022

Comments

  • w0rldart
    w0rldart over 1 year

    So, having the following file Adán-y-Eva-50x50.jpg when I try to access it, apache translates it to Ad\xc3\xa1n-y-Eva-50x50.jpg and won't find it, even though it exists.

    This happens only for filenames that contain UTF8 characters.

    I have already the following configuration in my /etc/httpd/conf/httpd.conf

    ...
    AddDefaultCharset UTF-8
    ...
    IndexOptions FancyIndexing VersionSort NameWidth=* HTMLTable +Charset=UTF-8
    ...
    

    And added also this to my root .htaccess on the first line:

    IndexOptions +Charset=UTF-8
    

    All this with no effect to load those kind of files. Any suggestions?

    UPDATE

    Just to mention it: I'm running the websites on a CentOS server with plesk panel preconfigured

    • Zubair
      Zubair over 11 years
      The options you list are for hinting about the character type in the content of the pages and not for the URL which is the problem you are describing. Any problems in encoding the URL may be due to the browser and not apache. It worked fine for me with apache 2.2.3 on centos5 with LANG=en_US.UTF8
    • Andrew B
      Andrew B over 11 years
      I'm inclined to agree with mtinberg, but just in case, can you elaborate on what you mean by "apache translates it to"? Is this in the URL bar of the browser, or a log file? Have you tried grabbing the index page from the console (assuming you have a UTF-8 enabled LANG variable and terminal) with wget or curl to verify that that is indeed what the webserver itself is sending?
    • Rosty Koryaha
      Rosty Koryaha over 11 years
      What browser are you using, and what language / characterset is the browser configured to use, see: Bug: Apache 2.0 Breaks Non-UTF-8 Encoded URLs on Windows
    • w0rldart
      w0rldart over 11 years
      happens on chrome, firefox and safari... it's not a browser issue as on the older server had no issue with the mentioned
    • Michal S
      Michal S over 7 years
      For those unable to solve similar problem by answers below. Check your UTF8 file names for NORMALIZTION FORM (C, D) For example when you transfer files from mac do linux with UTF* name it may be not proper fo new environment. Can be changed by convmv with --nfc flag.
  • w0rldart
    w0rldart over 11 years
    tried su -l -c locale apache and su -l -c locale my-user and none return the desired output. I also ran system-config-language and set to Spanish utf8, reboted and the same... also set i18n to es_ES.UTF-8
  • fboaventura
    fboaventura over 11 years
    in order to generate the locale file, and this will be done globally, you have to run locale-gen es_ES.utf8 as root.
  • w0rldart
    w0rldart over 11 years
    I'm runing on CentOS, I don't have locale-gen and have searched and all pointed to system-config-language
  • fboaventura
    fboaventura over 11 years
    I've changed the command to generate the locale file. After the generation of the file, you may change the i18n file inside sysconfig, restart your apache (or reboot your system) and test is out.
  • w0rldart
    w0rldart over 11 years
    I keep having the same issue... and su -l -c locale site-user still doesn't output anything
  • fboaventura
    fboaventura over 11 years
    Sorry mate! This is as far as I can go without seeing your system. I've set up a CentOS machine yesterday just to test and validate the commands above.