27

I need to clean a string that comes (copy/pasted) from various Microsoft Office suite applications (Excel, Access, and Word), each with its own set of encoding.

I'm using json_encode for debugging purposes in order to being able to see every single encoded character.

I'm able to clean everything I found so far (\r \n) with str_replace, but with \u00a0 I have no luck.

$string = 'mail@mail.com\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0;mail@mail.com'; //this is the output from json_encode

$clean = str_replace("\u00a0", "",$string);

returns:

mail@mail.com\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0;mail@mail.com

That is exactly the same; it completely ignores \u00a0.

Is there a way around this? Also, I'm feeling I'm reinventing the wheel, is there a function/class that completely strips EVERY possibile char of EVERY possible encoding?

____EDIT____

After the first two replies I need to clarify that my example DOES work, because it's the output from json_encode, not the actual string!

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
0plus1
  • 4,475
  • 12
  • 47
  • 89

9 Answers9

58

By combining ord() with substr() on my string containing \u00a0, I found the following curse to work:

$text = str_replace( chr( 194 ) . chr( 160 ), ' ', $text );
Lennart
  • 9,657
  • 16
  • 68
  • 84
Arne
  • 581
  • 4
  • 2
9

Try this:

$str = str_replace("\u{00a0}", ' ', $str);
6

Works for me, when I copy/paste your code. Try replacing the double quotes in your str_replace() with single quotes, or escaping the backslash ("\\u00a0").

Annika Backstrom
  • 13,937
  • 6
  • 46
  • 52
  • In your example it works because you are using the output from json_encode not the actual string! If I copy paste my code it works perfectly even for me. – 0plus1 Apr 07 '10 at 13:01
  • What happens if you replace `\xa0` rather than `\u00a0`? – Annika Backstrom Apr 07 '10 at 13:23
  • This happens. It does delete the instances of \u00a0 and when printed from json_encode it looks ok, however if i echo the string without json encode I get a � where before there was the \u00a0. At this point I can't understand what's going on.. please give me an explanation! :-) – 0plus1 Apr 07 '10 at 14:48
  • I found the solution, just assign the json_encode to a variable and then str_replace like there's no tomorrow. I would love to still understand the gimmick about \xa0 if you may.. – 0plus1 Apr 07 '10 at 15:02
  • That may be a null character… The escaped character `\u00a0` says "unicode character with hex value 00a0." My original suggestion would have only stripped out the a0 segment. Try replacing \x00a0 with a blank string. – Annika Backstrom Apr 07 '10 at 17:02
5

I just had the same problem. Apparently PHP's json_encode will return null for any string with a 'non-breaking space' in it.

The Solution is to replace this with a regular space:

str_replace(chr(160),' ');

I hope this helps somebody - it took me an hour to figure out.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jeremyj11
  • 589
  • 1
  • 5
  • 15
4

This one also works, i found somewhere

$str = trim($str, chr(0xC2).chr(0xA0));
www.amitpatil.me
  • 3,001
  • 5
  • 43
  • 61
3

A minor point: \u00a0 is actually a non-breaking space character, c.f. http://www.fileformat.info/info/unicode/char/a0/index.htm

So it might be more correct to replace it with " "

Daniel Winterstein
  • 2,418
  • 1
  • 29
  • 41
1

This did the trick for me:

$str = preg_replace( "~\x{00a0}~siu", " ", $str );
patrick
  • 11,519
  • 8
  • 71
  • 80
0

You have to do this with single quotes like this:

str_replace('\u00a0', "",$string);

Or, if you like to use double quotes, you have to escape the backslash - which would look like this:

str_replace("\\u00a0", "",$string);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
oezi
  • 51,017
  • 10
  • 98
  • 115
  • I ran into this problem as well. Here is the solution that worked for me. I copied a string with the known \u00a0 character into my editor, then copied the 'space' that \u00a0 represents and pasted it into the str_replace function. In the end it looks like this: str_replace(" ","",$string). The space in the first parameter is the non-standard \u00a0. Now run it through json_encode. – Nick Johnson Feb 20 '12 at 16:36
0

You can use json_encode($string, JSON_UNESCAPED_UNICODE |JSON_PRETTY_PRINT);

Mana S
  • 509
  • 2
  • 6