php-excel-reader - problem with UTF-8
Solution 1
I hope it's the same problem as I had: In excel_reader2.php on line 1120, replace
$retstr = ($asciiEncoding) ? $retstr : $this->_encodeUTF16($retstr);
with
$retstr = ($asciiEncoding) ? iconv('cp1250', 'utf-8', $retstr) : $this->_encodeUTF16($retstr);
That should fix it, however I suggest you use a different excel reader, such as PHPExcel to avoid problems like these.
Note that you need iconv
extension enabled on the server.
Solution 2
I has the answer for this problem, use php_excel_reader like common! Add a function to Spreadsheet_Excel_Reader class:
function seems_utf8($str) {
for ($i=0; $i<strlen($str); $i++) {
if (ord($str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif ((ord($str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif ((ord($str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif ((ord($str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
elseif ((ord($str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == strlen($str)) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
And add below line 1120: $retstr = $this->seems_utf8($retstr)?$retstr:utf8_encode($retstr);
Finish!
You can use file php_excel_reader, that i modify! Download here : File excel_reader2.php Use like common with Original-excel-reader
Viktor Stískala
Updated on July 17, 2022Comments
-
Viktor Stískala almost 2 years
I'm using php-excel-reader 2.21 for converting XLS file to CSV. I wrote a simple script to do that, but I have some problems with unicode characters. It does not return values from some cells.
For example it doesn't have problems with cell content
ceník položek
but have problems withnákup
,VÝROBCE
,PÁS
,HRUBÝ
,NÁKLADNÍ
and some others. In these cells it returns empty value (""
).Here is the code snippet I use for conversion:
<?php set_time_limit(120); require_once 'excel_reader2.php'; $data = new Spreadsheet_Excel_Reader("cenik.xls", false, 'UTF-8'); $f = fopen('file.csv', 'w'); for($row = 1; $row <= $data->rowcount(); $row++) { $out = ''; for($col = 1; $col <= $data->colcount(); $col++) { $val = $data->val($row,$col); // escape " and \ characters inside the cell $escaped = preg_replace(array('#”#u', '#\\\\#u', '#[”"]#u'), array('"', '\\\\\\\\', '\"'), $val); if(empty($val)) $out .= ','; else $out .= '"' . $escaped . '",'; } // remove last comma (,) fwrite($f, substr($out, 0, -1)); fwrite($f, "\n"); } fclose($f); ?>
Note that the cell and row indexes starts from 1. Any suggestions?