0

In a follow-up to this question, I need to compare two strings in a case-insensitive manner, ignoring any non-alphanumeric characters except the comma and the semicolon, in JavaScript. So

Times New Roman, Times, Sans-Serif

matches

Times New Roman,Times,SansSerif            

Can somebody get me started with the right function/approach? Is there something ready-made to do this in JS, or do I have to cut all clutter from both strings and compare them then?

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088

5 Answers5

5

Normalize both strings and compare them:

str1.toLowerCase().replace(/[^a-z0-9,;]+/g, "") == str2.toLowerCase().replace(/[^a-z0-9,;]+/g, "")

Here the strings are converted to lowercase and then all characters except alphanumeric characters, the comma and semicolon are removed before comparison.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
3

Gumbo's method - but cleaner to read.

function compareStripped(str1, str2) {
  function strip(str) {
    // lower case and removes anything but letters, numbers, commas and semi-colons
    return str.toLowerCase().replace(/[^a-z0-9,;]+/g,'');
  }
  return strip(str1) == strip(str2);
}
gnarf
  • 105,192
  • 25
  • 127
  • 161
0

first search and replace on both strings:

s/[^a-zA-Z0-9,;]+/""/g

and then compare them.

ennuikiller
  • 46,381
  • 14
  • 112
  • 137
0

Or this:

var s1='Times New Roman, Times, Sans-Serif';
var s2='Times New Roman,Times,SansSerif';


/^(.+){2}$/i.test((s1+s2).replace(/[^\da-zA-Z,;]/g,''));
kennebec
  • 102,654
  • 32
  • 106
  • 127
  • Doesn't match the requirements of the OP - Specifically, the RE will not keep semi-colon's or numbers. Also, the `/^(.+){2}$/i` might be really clever, but it is more computationally expensive than "s1 == s2" – gnarf Dec 25 '09 at 17:53
  • Thanks for pointing out the number && semicolon req. The test isn't so clever, but it only needs to make one test, since only one case will satisfy the condition. The replace is where the regexp has to work- and running it once instead of twice, and not converting to lower case should save something. – kennebec Dec 25 '09 at 19:43
  • You meant `/^(.+)\1$/` - realized that `(.+){2}` will match any string with 2 characters. With that change added, I did some performance testing. Your regexp gets slightly worse performance than gumbos, occasionally beating mine. In the Match Case you provided (10000 iterations, 10 loops, throwing out variance and rounding): Gumbo 108, Me 131, You 132. Quadrupling string size Gumbo: 293, Me: 311, You: 415. Quad string + adding 1 char to start of s2: yours increases to 503, ours stay same. Also - yours would still match on `var s1='Times New Roman, Times, Times New Roman'; var s2=',Times,';` – gnarf Dec 25 '09 at 21:01
0

The suggested regex can be shortened with builtin character classes. Second, the normalization should be separate from the test for equality. Here's something that might not pass code review where I work, but it's so short I thought I'd post.

String.prototype.normalized = function() { 
 return this.replace(/[^\w\d,;]/g,"");
};

var s1='Times New Roman, Times, Sans-Serif';
var s2='Times New Roman,Times,SansSerif';

if(s1.normalized() == s2.normalized()) document.write("equality!");
gnarf
  • 105,192
  • 25
  • 127
  • 161
billw
  • 99
  • 2