Unable to retrieve UTF-8 accented characters from Access via PDO_ODBC

17,304

Solution 1

The Problem

When using native PHP ODBC features (PDO_ODBC or the older odbc_ functions) and the Access ODBC driver, text is not UTF-8 encoded, even though it is stored in the Access database as Unicode characters. So, for a sample table named "Teams"

Team
-----------------------
Boston Bruins
Canadiens de Montréal
Федерация хоккея России

the code

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr = 
        'odbc:' .
        'Driver={Microsoft Access Driver (*.mdb)};' .
        'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb;' .
        'Uid=Admin;';
$db = new PDO($connStr);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$sql = "SELECT Team FROM Teams";
foreach ($db->query($sql) as $row) {
    $s = $row["Team"];
    echo $s . "<br/>\n";
}
?>
</body>
</html>

displays this in the browser

Boston Bruins
Canadiens de Montr�al
????????? ?????? ??????

The Easy but Incomplete Fixes

The text returned by Access ODBC actually matches the Windows-1252 character encoding for the characters in that character set, so simply changing the line

$s = $row["Team"];

to

$s = utf8_encode($row["Team"]);

will allow the second entry to be displayed correctly

Boston Bruins
Canadiens de Montréal
????????? ?????? ??????

but the utf8_encode() function converts from ISO-8859-1, not Windows-1252, so some characters (notably the Euro symbol '€') will disappear. A better solution would be to use

$s = mb_convert_encoding($row["Team"], "UTF-8", "Windows-1252");

but that still wouldn't solve the problem with the third entry in our sample table.

The Complete Fix

For full UTF-8 support we need to use COM with ADODB Connection and Recordset objects like so

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr = 
        'Driver={Microsoft Access Driver (*.mdb)};' .
        'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb';
$con = new COM("ADODB.Connection", NULL, CP_UTF8);  // specify UTF-8 code page
$con->Open($connStr);

$rst = new COM("ADODB.Recordset");
$sql = "SELECT Team FROM Teams";
$rst->Open($sql, $con, 3, 3);  // adOpenStatic, adLockOptimistic
while (!$rst->EOF) {
    $s = $rst->Fields("Team");
    echo $s . "<br/>\n";
    $rst->MoveNext;
}
$rst->Close();
$con->Close();
?>
</body>
</html>

Solution 2

A bit more easily to manipulate the data. (Matrix array).

function consulta($sql) {
        $db_path =  $_SERVER["DOCUMENT_ROOT"] . '/database/Registros.accdb';
        $conn = new COM('ADODB.Connection', NULL, CP_UTF8) or exit('Falha ao iniciar o ADO (objeto COM).'); 
        $conn->Open("Persist Security Info=False;Provider=Microsoft.ACE.OLEDB.12.0;Jet OLEDB:Database Password=ifpb@10510211298;Data Source=$db_path"); 
        $rs = $conn->Execute($sql);
        $numRegistos = $rs->Fields->Count;
        $index = 0;
        while (!$rs->EOF){      
            for ($n = 0; $n < $numRegistos; $n++) {
                if(is_null($rs->Fields[$n]->Value)) continue;
                $resultados[$index][$rs->Fields[$n]->Name] = $rs->Fields[$n]->Value;
                echo '.';
            }
            echo '<br>';
            $index = $index + 1;
            $rs->MoveNext();
        }
        $conn->Close();
        return $resultados;
    }

    $dados = consulta("select * from campus");

    var_dump($dados);
Share:
17,304

Related videos on Youtube

Chris6657456456
Author by

Chris6657456456

Updated on June 04, 2022

Comments

  • Chris6657456456
    Chris6657456456 almost 2 years

    I am trying to get an Access DB converted into MySQL. Everything works perfectly, expect for one big monkey wrench... If the access db has any non standard characters, it wont work. My query will tell me:

    Incorrect string value: '\xE9d'
    

    If I directly echo out the rows text that has the 'invalid' character I get a question mark in a black square in my browser (so é would turn into that invalid symbal on echo).

    NOTE: That same from will accept, save and display the "é" fine in a textbox that is used to title this db upload. Also if I 'save as' the page and re-open it up the 'é' is displayed correctly....

    Here is how I connect:

    $conn = new PDO("odbc:Driver={Microsoft Access Driver (*.mdb)};Dbq=$fileLocation;SystemDB=$securefilePath;Uid=developer;Pwd=pass;charset=utf;");
    

    I have tried numerous things, including:

    $conn -> exec("set names utf8");
    

    When I try a 'CurrentDb.CollatingOrder' in access it tells me 1033 apparently that is dbSortGeneral for "English, German, French, and Portuguese collating order".

    What is wrong? It is almost like the PDO is sending me a collation my browser and PHP does not fully understand.

    • Gord Thompson
      Gord Thompson about 9 years
      Did you try using the utf8_encode() function on the string you retrieved from the Access database? That might work, provided that the accented characters you are retrieving are represented in the ISO-8859-1 character set (like é is).