PHP: Dealing special characters with iconv

71,563

Solution 1

And did you save your source file in UTF-8 encoding? If not (and I guess you didn't since that will produce the "incomplete multibyte character" error), then try that first.

Solution 2

$clean = iconv('UTF-8', 'ASCII//TRANSLIT', utf8_encode($s));
Share:
71,563
Run
Author by

Run

A cross-disciplinary full-stack web developer/designer.

Updated on May 30, 2020

Comments

  • Run
    Run almost 4 years

    I still don't understand how iconv works.

    For instance,

    $string = "Löic & René";
    $output = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string); 
    

    I get,

    Notice: iconv() [function.iconv]: Detected an illegal character in input string in...

    $string = "Löic"; or $string = "René";

    I get,

    Notice: iconv() [function.iconv]: Detected an incomplete multibyte character in input string in.

    I get nothing with $string = "&";

    There are two sets of different outputs I need store them in the two different columns inside the table of my database,

    1. I need to convert Löic & René to Loic & Rene for clean url purposes.

    2. I need to keep them as they are - Löic & René as Löic & René then only convert them with htmlentities($string, ENT_QUOTES); when displaying them on my html page.

    I tried with some of the suggestions in php.net below, but still don't work,

    I had a situation where I needed some characters transliterated, but the others ignored (for weird diacritics like ayn or hamza). Adding //TRANSLIT//IGNORE seemed to do the trick for me. It transliterates everything that is able to be transliterated, but then throws out stuff that can't be.

    So:

    $string = "ʿABBĀSĀBĀD";
    
    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
    // output: [nothing, and you get a notice]
    
    echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
    // output: ABBSBD
    
    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
    // output: ABBASABAD
    // Yay! That's what I wanted!
    

    and another,

    Andries Seutens 07-Nov-2009 07:38
    When doing transliteration, you have to make sure that your LC_COLLATE is properly set, otherwise the default POSIX will be used.
    
    To transform "rené" into "rene" we could use the following code snippet:
    setlocale(LC_CTYPE, 'nl_BE.utf8');
    
    $string = 'rené';
    $string = iconv('UTF-8', 'ASCII//TRANSLIT', $string);
    
    echo $string; // outputs rene
    

    How can I actually work them out?

    Thanks.

    EDIT:

    This is the source file I test the code,

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" class="no-js">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    </head>
    <?php
    $string = "Löic & René";
    $output = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string); 
    ?>
    </html>