I encountered a little problem when parsing CSV-Strings that contain german umlauts (-> ä, ö, ü, Ä, Ö, Ü) in PHP.
Assume the following csv input string:
w;x;y;z
48;OSL;Oslo Stock Exchange;B
49;OTB;Österreichische Termin- und Optionenbörse;C
50;VIE;Wiener Börse;D
And the appropriate PHP code used to parse the string and create an array which contains the data from the csv-String:
public static function parseCSV($csvString) {
$rows = str_getcsv($csvString, "\n");
// Remove headers ..
$header = array_shift($rows);
$cols = str_getcsv($header, ';');
if(!$cols || count($cols)!=4) {
return null;
}
// Parse rows ..
$data = array();
foreach($rows as $row) {
$cols = str_getcsv($row, ';');
$data[] = array('w'=>$cols[0], 'x'=>$cols[1], 'y'=>$cols[2], 'z'=>$cols[3]);
}
if(count($data)>0) {
return $data;
}
return null;
}
The result of calling the above function with the given csv-string results in:
Array
(
[0] => Array
(
[w] => 48
[x] => OSL
[y] => Oslo Stock Exchange
[z] => B
)
[1] => Array
(
[w] => 49
[x] => OTB
[y] => sterreichische Termin- und Optionenbörse
[z] => C
)
[2] => Array
(
[w] => 50
[x] => VIE
[y] => Wiener Börse
[z] => D
)
)
Note that the second entry is missing the Ö. This only happens, if the umlaut is placed directly after the column separator character. It also happens, if more than one umlaut is places in sequence, i.e. "ÖÖÖsterreich" -> "sterreich". The csv-string is sent using a HTML-Form, thus the content gets URL-encoded. I use a Linux server, with utf-8 encoding and the csv-string looks correct before parsing.
Any ideas?