0

I'm reading my music directory to populate a JSON for jPlayer, as follow:

<?php
//tried utf-8, shift_jis, etc. No difference
header('Content-Type: application/json; charset=SHIFT_JIS');

//cant be blank so i put . to make current file dir as base
$Directory = new RecursiveDirectoryIterator('.');
$Iterator = new RecursiveIteratorIterator($Directory);
$Regex = new RegexIterator($Iterator, '/^.+\.mp3$/i', RecursiveRegexIterator::GET_MATCH);
//instead of glob(*/*.mp3) because isnt recursive

$filesJson = [];

foreach ($Regex as $key => $value) {
    $whatever = str_ireplace(['.mp3','.\\'], '', $key);
    $filesJson['mp3'][] = [
        'title' => htmlspecialchars($whatever),
        'mp3' => $key
    ];

}
echo json_encode($filesJson);
exit();
?>

The problem lies in files which filename isn't standard UTF-8 - as Latin, Japanese and Korean ones. Examples:

Japanese

enter image description here

Korean

enter image description here

Latin (pt-br)

enter image description here

Which converts into ?, or simply becomes null when parsing latin names ( Geração or for e.g.)

enter image description here


So, how make the filenames/paths be parsed correctly with different kinds of languages? The header charset isn't helping.

Info:

XAMPP with Apache2 + PHP 5.4.2 at Win7 x86


Update #1:

Tried @infinity's answer but no changes. Still ? on JP, null on Latin.

<?php
header('Content-Type: application/json; charset=UTF-8');
mb_internal_encoding('UTF-8');

$Directory = new RecursiveDirectoryIterator('.');
$Iterator = new RecursiveIteratorIterator($Directory);
$Regex = new RegexIterator($Iterator, '/^.+\.mp3$/i', RecursiveRegexIterator::GET_MATCH);

$filesJson = [];

foreach ($Regex as $key => $value) {
    $whatever = mb_substr($key, 2, mb_strlen($key)-6, "utf-8"); // 2 to remove .\ and -6 to remove .mp3 (-4 + -2)
    $filesJson['mp3'][] = [
        'title' => $whatever, //tried with and without htmlspecialchars
        'mp3' => $key
    ];

}
echo json_encode($filesJson);
exit();
?>

If I use HTML-ENTITIES instead of utf-8 on mb_substr(), latin characters works but asian still ?.

Community
  • 1
  • 1
RaphaelDDL
  • 4,452
  • 2
  • 32
  • 56
  • Do you have control over how the file names are written in the system? – Mike Brant Nov 12 '13 at 19:39
  • @MikeBrant I do but having to rename all filenames of like.... 60GB+ of soundtrack isn't something 'fun'. – RaphaelDDL Nov 12 '13 at 20:24
  • Try adding /u modifier to regex, i.e. '/^.+\.mp3$/ui' – Max Ivanov Nov 14 '13 at 09:50
  • tbh, problems with mixed charsets will drive you crazy and kill your code sooner or later. Better try to fix your input (in thi case your filenames) and go from there with a clean utf-8 (which supports ALL languages). Having to constantly switch and convert in your code WILL break your code and yourself. – ToBe Nov 14 '13 at 16:36
  • @maxivanov with `u`, the latin ones disappear from the listing (not even as `null`) while japanese stays `?`. – RaphaelDDL Nov 14 '13 at 20:03
  • @ToBe Currently I will only use for my self entertainment but I was expecting to fix it so I could use on a project I'm working to manage the current soundtracks available for the autoDJ function on centova,shoutcast,etc(which in my case,it's a asian music webradio).Therefore most annoying having to fix each file's name since I won't be the one uploading the content most of time.But yeah,I've worked with mixed charsets on HTML and was a chaos already.The point is that I feel the problem lies more on how RecursiveIterator and the Regexp return than the other parts of the code. – RaphaelDDL Nov 14 '13 at 20:08
  • Did you try using the old style directory iteration via dir() and/or did you try filtering your files not via Regex but via plain old if and a non-complex "compare last 4 letters to '.mp3' method? Might be best to check on those suspected parts with regexp and RecursiveIterator first. – ToBe Nov 15 '13 at 09:29
  • @ToBe Could you post an example as answer? I tried with `dir` but I failed in making it recursive. Because sometimes is `musicfolder\the_album\songfile.ext` but sometimes is `musicfolder\the_band\the_album\songfile.ext` for example. – RaphaelDDL Nov 18 '13 at 12:43
  • You would have to write your own recursive function, I'll check if I got something in a drawer somewhere... – ToBe Nov 18 '13 at 14:05
  • did you find solution to the problem? – Alex Rashkov Mar 03 '14 at 22:55
  • @infinity Nope. I kinda gave up trying after the answers didn't helped much solving and since I was making at first as own entertainment (even if I could use it later for a better autoDJ script), I stopped giving so much time to it. – RaphaelDDL Mar 05 '14 at 17:33

4 Answers4

1
<?php
header('Content-Type: application/json; charset=utf-8');
mb_internal_encoding('utf-8');

foreach ($Regex as $key => $value) {
    $whatever = mb_substr($key, 0, mb_strlen($str)-4, "utf-8");
    // ... rest of code
}
Alex Rashkov
  • 9,833
  • 3
  • 32
  • 58
  • `Call to undefined function mb_substring()` but I just checked xampp's `php.ini`, the line `extension=php_mbstring.dll` is uncommented and the dll exists on `php\ext` folder. Using PHP 5.4.16. Any ideas? – RaphaelDDL Nov 12 '13 at 17:42
  • Sorry my bad, the function is `mb_substr` http://php.net/manual/en/function.mb-substr.php – Alex Rashkov Nov 12 '13 at 18:57
  • No worries, thanks for correcting. Well, I changed to the correct function and changed that `$str` for `$key` but yet the same thing: Japanese is ?, latin becomes null. I've updated question with your code. – RaphaelDDL Nov 12 '13 at 19:25
1

A short try on a recursive approach using dir():

myRecursiveScanDir($mypath);

function myRecursiveScanDir($path)
    $d = dir($path);
    while (false !== ($entry = $d->read())) {

       // Do something, ie just echo it
       echo $path."/".entry."<br/>";

       if(is_dir($path."/".entry))
           myRecursiveScanDir($path."/".entry);
    }
    $d->close();
)

getting file extension and/or basename could be a bit problematic too. You might have to debug and test how mb_substr,pathinfo and basename react to such filenames.

ToBe
  • 2,667
  • 1
  • 18
  • 30
  • `Cannot redeclare scanDir()` :( Seems already [exists](http://php.net/manual/en/function.scandir.php) – RaphaelDDL Nov 19 '13 at 16:15
  • Changed tha function name. You should allways understand what you copy. Code is untested to show how you could do it. It's hardly a perfect working and fireproofed setup. ;) – ToBe Nov 19 '13 at 16:59
  • I did changed the name, as I would in javascript and etcetera. But started getting warns about a third parameter which I had no idea of where it was getting called. Will test again. Normally when I don't know enough to actually fix, I don't try to mess much with it. – RaphaelDDL Nov 22 '13 at 23:52
1

The operating system you're using may be important in this case:

Please reffer to this question: Why does Windows need to `utf8_decode` filenames for `file_get_contents` to work?

I think it may be relevant since the screenshots look very Microsoftish.

Community
  • 1
  • 1
MythThrazz
  • 1,629
  • 17
  • 25
0

to match any letter/digits

\p{L}\p{N}