0

I can't match the character "#" at the end of a word with regex

    /\b(C#)\b/i

I'm working on some MongoDB queries. The subject of the search is programming languages on a given text field of my collection.

The regex I'm using, and is almost always working, is

   /\b(java|php)\b/i

(for a concrete case where I'm looking for Java and PHP).

The word boundaries are needed to search whole words (javascript must not match java)

The problem is, as said before, when I look for "C#", the regex just fails, throwing no results.

The regex works if I remove the last boundary, but then the java/javascript example fails.

I've being stuck in this for a couple of days now, any help would be appreciated.

1 Answers1

0

Per https://stackoverflow.com/a/3241901/2191572:

A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one.

You need to create your own definition of word boundary:

\b(java|php|c#)(?=[^a-z0-9_]|$)

https://regex101.com/r/LoOADH/1


Note: If you needed to match something like #b because it's the latest programming craze then you would also have to replace the leading \b with the lookahead and start string assertion ^ so it would look like:

(?=[^a-z0-9_]|^)(java|php|c#|#b)(?=[^a-z0-9_]|$)
MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • Thanks, I just used negative lookahead for the definition, the final regex is `/\b(C#)(?!\w)/i` – Edgar Ramos - d3249 Sep 19 '19 at 19:34
  • @EdgarRamos-d3249 Oh nice, I had a funny feeling I was making it more difficult than it needs to be. As a side-note, `(?!\w)` would fail if you needed to find `#b` but I guess you can cross that bridge when you get to it. – MonkeyZeus Sep 19 '19 at 19:41