$_POST will convert from utf-8 to ä ö ü etc

45,488

Solution 1

You are facing many different problems at the same, let's start with the simplest one.

Problem 1) You say that echo $_POST['field']; will display it correctly? What do you mean with "display"? It can be displayed correctly in two cases:

  • either the field is in UTF-8 and your page has been declared as UTF-8 and the browser is displaying it as UTF-8 or,
  • the field is in Latin-1 and the browser has decided (through the auto-detection heuristics) that your page is in Latin-1.

So, the fact that echo $_POST['field']; is correct tells you nothing.

Problem 2) You are using

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
header('Content-Type:text/html; charset=UTF-8');

Is this PHP code? If it is, it will be an error because the header must be set before sending out any byte. If you do this you will not set the Content-Type header and PHP should generate a warning.

Problem 3) You are using

<form action="whatever.php" accept-charset="UTF-8">

Some browsers (IE, mostly) ignore accept-charset if they can coerce the data to be sent in ASCII or ISO Latin-1. So the data will be in UTF-8 and declared as ISO Latin-1 or ISO Latin-1 and sent as ISO Latin-1 (but this second case is not your case).

Have a look at https://stackoverflow.com/a/8547004/449288 to see how to solve this problem.

Problem 4) Which strings are you comparing? For example, if you have

$city = "München"
$_POST['city'] == $city

The result of this code will depend on the encoding of the PHP file. If the file is encoded in ISO Latin-1 and the $_POST correctly contains UTF-8 data, the == will compare different bytes and will return false.

Solution 2

Another solution that may be helpful is in Apache, you can place a directive in your configuration file (httpd.conf) or .htacess called AddDefaultCharset. It looks like this:

AddDefaultCharset utf-8

http://httpd.apache.org/docs/2.0/mod/core.html#adddefaultcharset

That will override any other default charsets.

Solution 3

I changed "mbstring.detect_order = pass" in my php.ini file and i worked

Share:
45,488
lungov
Author by

lungov

Updated on June 01, 2021

Comments

  • lungov
    lungov almost 3 years

    I am new here, so I apologize if I am doing anything wrong.

    I have a form which submits user input onto another page. User is expected to type ä, ö, é, etc... I have placed all of the following in the document:

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    header('Content-Type:text/html; charset=UTF-8');
    <form action="whatever.php" accept-charset="UTF-8">
    

    I even tried:

    ini_set('default_charset', 'UTF-8');
    

    When the other page loads, I need to check what the user input with something like:

    if ( $_POST['field'] == $check ) {
      ...
    }
    

    But if he inputs something like 'München', PHP will compare 'München' with 'München' and will never trigger TRUE even though it should. Since it is specified UTF-8 everywhere, I am guessing that the server is converting to something else (Windows-1252 as I read on another thread) because it does not support or is not configured to UTF-8. I am using Apache on a local server before I load into production; I have not changed (and don't know how to) any of the default settings. I've been working on a Windows 7, editing with Notepad++ enconding my files in ANSI. If I bin2hex('München') I get '4dc3bc6e6368656e'.

    If I echo $_POST['field']; it displays 'München' correctly.

    I have researched everywhere for an explanation, all I find is that I should include those tags/headings I already have.

    Any help is much appreciated.

  • lungov
    lungov over 12 years
    I also tried with only the meta tag in head, which is on every page. I guess this is what you are saying, right?
  • Mr Lister
    Mr Lister over 12 years
    These days, I would hesitate to recommend ISO-8859-1 as an encoding. There are those who believe that ISO-8859-1 should be used as an alias of windows-1252, which is not true. Explicitly specifying windows-1252 is better, since there can't be confusion then.
  • zrvan
    zrvan over 12 years
    @MrLister are you suggesting I should alter the answer and specify CP1252 instead? I'm not very familiar with Windows, but was under the impression that they used ISO-8859-1 now.
  • Mr Lister
    Mr Lister over 12 years
    Yes, and many people think that, but it's not the case. Anyway, I believe UTF-8 would be best, while trying to not encode or decode things anywhere in the process. 8-bit character sets are on their way out.
  • zrvan
    zrvan over 12 years
    @MrLister I've updated the answer, perhaps you'd care to share a reference?
  • gioele
    gioele about 12 years
    The method you suggest is very error-prone: without the correct header or accept-charset the browser will not know which encoding should be used for the contained forms. And if you are not adding a non-UTF8 character that is not in ISO Latin-1 in the data to be posted, IE will send the form data as ISO Latin-1, not UTF-8. Also, the Content-Type header of the response has zero influence on the way characters are sent during the POST, it only influences how the response should be decoded and displayed.