How can I convert "Western (Mac OS Roman)" formatted text to UTF-8 with PHP?

12,368

The mb-functions can't handle "macintosh" which is the IANA defined name for Mac Roman. You have to use iconv.

$line = iconv('macintosh', 'UTF-8', $line);
Share:
12,368
Angry Dan
Author by

Angry Dan

web/software developer, .NET, C#, WPF, PHP, software trainer, English teacher, have philosophy degree, love languages, run marathons my tweets: http://www.twitter.com/edward_tanguay my runs: http://www.tanguay.info/run my code: http://www.tanguay.info/web my publications: PHP 5.3 training video (8 hours, video2brain) my projects: http://www.tanguay.info

Updated on July 25, 2022

Comments

  • Angry Dan
    Angry Dan almost 2 years

    I have files being exported by Excel for Mac 2011 VBA in Western (Mac OS Roman) as shown here:

    alt text

    I haven't been successful in getting Excel for Mac VBA to export directly to UTF-8 so I want to convert these files with PHP before I save them to MySQL, I am using this command:

    $dataset[$k] = mb_convert_encoding($line, 'ASCII', 'UTF-8'); //not correctly converted
    $dataset[$k] = mb_convert_encoding($line, 'ISO-8859-8', 'UTF-8'); //not correctly converted
    $dataset[$k] = mb_convert_encoding($line, 'macintosh', 'UTF-8'); //unrecognized name
    $dataset[$k] = mb_convert_encoding($line, 'Windows-1251', 'UTF-8'); //changes "schön" to "schљn"
    $dataset[$k] = mb_convert_encoding($line, 'Windows-1252', 'UTF-8'); //changes "schön" to "schšn"
    

    I found this list of valid encoding formats from 2008, but none of them seem to represent Western (Mac OS Roman).

    * UCS-4
    * UCS-4BE
    * UCS-4LE
    * UCS-2
    * UCS-2BE
    * UCS-2LE
    * UTF-32
    * UTF-32BE
    * UTF-32LE
    * UTF-16
    * UTF-16BE
    * UTF-16LE
    * UTF-7
    * UTF7-IMAP
    * UTF-8
    * ASCII
    * EUC-JP
    * SJIS
    * eucJP-win
    * SJIS-win
    * ISO-2022-JP
    * JIS
    * ISO-8859-1
    * ISO-8859-2
    * ISO-8859-3
    * ISO-8859-4
    * ISO-8859-5
    * ISO-8859-6
    * ISO-8859-7
    * ISO-8859-8
    * ISO-8859-9
    * ISO-8859-10
    * ISO-8859-13
    * ISO-8859-14
    * ISO-8859-15
    * byte2be
    * byte2le
    * byte4be
    * byte4le
    * BASE64
    * HTML-ENTITIES
    * 7bit
    * 8bit
    * EUC-CN
    * CP936
    * HZ
    * EUC-TW
    * CP950
    * BIG-5
    * EUC-KR
    * UHC (CP949)
    * ISO-2022-KR
    * Windows-1251 (CP1251)
    * Windows-1252 (CP1252)
    * CP866 (IBM866)
    * KOI8-R
    

    What format do I need to use to convert "Western (Mac OS Roman) to UTF-8?