0

The Details


I have a simple textarea <textarea></textarea>

The value of this textarea is sent through ajax and stored in a database.

The value in this database is viewed on an iPad (or iPad mini or iPhone, etc)


The Problem


When someone copies text from somewhere (could be anywhere from the internet potentially), I want to remove any weird characters such as: “windows-1252 quotes” from the text before storing them in a utf8_unicode_ci column in a database. This column stores the above quotes but are unknown on certain devices (like iPad)


The Question


How can I remove these characters in Javascript or PHP?

string.replace has been tried from various examples to remove these characters.

htmlentities($sample) has been tried in order to convert these characters but still no luck.

Any help would be appreciated! Thanks!

brandonscript
  • 68,675
  • 32
  • 163
  • 220
Mr. Meeseeks
  • 1,841
  • 2
  • 21
  • 37

1 Answers1

0

Regular expressions will do this; php's function for this is preg_replace, javascript's is simply .replace(). You can find usage snippets everywhere ;)

There are two ways to approach this using regex:

1. Define an allowed character range and strip anything that isn't in that range.

[^\w-=+()!@#$%^*(] will match NOT anything in this character range (the ^ at the beginning of the character class denotes this). You can then take the resulting matched characters and replace with an empty string.

Working example: http://regex101.com/r/zK2qW6

2. Define a non-allowed character range and strip anything that is in that range.

[“”] will match anything in this character range. You can then take the resulting matched characters, and again replace with an empty string. You could also use a regex unicode range here too.

Working example: http://regex101.com/r/yG4qJ4


In the end, you should choose the path which requires the smallest expression. If there's only a handful of characters to replace, use option #2. If you only want to allow a handful of characters, use option #1.

brandonscript
  • 68,675
  • 32
  • 163
  • 220