Special characters in mysql database

15,228

Solution 1

Character sets and collation

As others have stated, one of your problems could be down to character sets and collation. You need to ensure that the whole chain (input, storage and output) is correctly configured to handle the characters that you are using. UTF-8 is often a good choice, as it can handle every character in the Unicode character set.

To create a MySQL database or table using UTF-8 with case-insensitive collation:

CREATE DATABASE mydb
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

CREATE TABLE mytable ( ... )
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

Escaping

mysql_real_escape_string (I'm assuming that you are using PHP) is used to help the MySQL parser distinguish between your parameters and SQL keywords. It is used when the whole SQL command is supplied as a single string:

INSERT INTO mytable VALUES ("this \" is a double quote");

The backslash is required to help MySQL understand that the double quote in the middle of the string is in fact a literal double quote in the middle of the string, and not a closing double quote.

By escaping your data before inserting it into the database, you are directly altering that data: you are no longer storing the original data, and therefore have to process it again when you retrieve it from the database (to un-escape it).

Prepared statements

To make things easier, for both you and Mysql, you can use prepared statements instead. Prepared statements use placeholders to show MySQL exactly which parts of the SQL statement are your parameters:

$stmt = $dbh->prepare("INSERT INTO mytable VALUES (?)");
$stmt->execute(array('this " is a double quote'));

By using prepared statements, you can insert your data into the database unmodified - no messy escaping is required. This has the added advantage of significantly reducing the possibility of SQL injection. See Bill Karwin's Sql Injection Myths and Fallacies talk and slides for more information on this subject.

Output

Now that your data is stored in its original format, you are free to output it however you wish. If you are outputting HTML (to be displayed as literal HTML), then you will need to escape it prior to output. There are a number of ways to do this, including htmlspecialchars and HTML Purifier. Which method you choose depends on the source of your data, and exactly how you want it to be displayed.

Solution 2

I suspect the problem is in the character sets in use. Your mysql collation needs to support the characters you are trying to use, and your webpages need to be in a matching character set.

Most likely, your MySQL database is using a collation such as latin1_general_ci, while your webpages are supposedly being displayed in UTF-8. I would suggest you set MySQL to store data in UTF-8, and your web pages should output a header stating they are UTF-8.

Solution 3

If you make sure your database encoding and page encoding are UTF-8 then this should help most of the way along.

Share:
15,228

Related videos on Youtube

Fred
Author by

Fred

Updated on June 04, 2022

Comments

  • Fred
    Fred almost 2 years

    I have a mysql database with questions and answers that are displayed in HTML paragraphs and buttons. The q&a contains lots of special characters eg é,...,',",ö and also some html tags like sup.

    I have tried mysqli_real_escape_string, htmlentities and adding backslashed but some characters always show incorrectly on the page. Sometimes it's correct in the paragraphs but incorrect on the buttons.

    What is the correct function to use to make all these special characters display correctly and when should I use it (when inserting into the database or when selecting from the database/making it into HTML?

    Many thanks

  • Fred
    Fred over 12 years
    The HTML says <meta http-equiv="content-type" content="text/xml; charset=utf-8" /> and the database collation is utf8_general_ci. Is this ok? With these settings and without using any php functions for special characters the paragraphs look ok but on the buttons text within double quotes is not shown and HTML tags are displayed as is (not made into user output). Further, to be able to insert into the database I need to remove all single quotes (which are needed in a lot of names).
  • Fred
    Fred over 12 years
    mysql_real_escape_string solves the single quote issue. I can replace all double quotes with single quotes. This leaves only the problem with button values being displayed as HTML code (eg I want to use superscript characters on buttons) but I guess there might not be a solution for this. Many thanks for all help!
  • Mike
    Mike over 12 years
    @Fred: IMO mysql_real_escape_string is a nasty hack which should be avoided when prepared statements are available instead. Can you update your question to include more information about the button values?
  • asc99c
    asc99c over 12 years
    Rather than removing single quotes, usually, what you really want is to use a prepared statement to insert the data (e.g. the statement should be something like 'insert into comment values( ?, ?, ? )' and the values will be bound by the database library.
  • asc99c
    asc99c over 12 years
    Sorry forgot to add, the character sets look fine. I would double check (in something like Firebug) that the browser is really seeing UTF-8. We had an issue in out application at one point, where the Apache headers said ISO-8859-1, whilst our meta tag stated UTF-8.
  • Fred
    Fred over 12 years
    Prepared statements sound like a good idea but I'm pretty new at this and don't understand how to use it. What do I have to add to my mysql insert? Regarding the button values, I have a multiple choice quiz with four alternatives displayed on four buttons. For instance I ask for the factor of the prefix zetta and one of the buttons should show 10 and 21 within the sup tag (but of course I don't want the tag itself to show.
  • Fred
    Fred over 12 years
    Thank you, I think the character set is ok. Prepared statements sounds good but I don't understand how to use it. I only know how to use basic mysql inserts.
  • asc99c
    asc99c over 12 years
    What programming language are you using to insert the data?
  • Fred
    Fred over 12 years
    I am using PHP to insert the data.
  • asc99c
    asc99c over 12 years
    Then just see Mike's answer, and read the linked documentation. Learning to use prepared statements will be well worth the effort.