-1

Searching for hashtags as part of a singular word (not a portion of a word) in the content like so:

SELECT * FROM `messages` WHERE LOWER(`messages`.`content`) REGEXP '[[:<:]]#anxiety[[:>:]]'

It is not finding any records, however a search for the word "anxiety" works:

SELECT * FROM `messages` WHERE LOWER(`messages`.`content`) REGEXP '[[:<:]]anxiety[[:>:]]'

Looking to find messages like "She doesn't like thunderstorms. #anxiety #nervous."

Not looking to match parts of a word like "abc#anxiety". It should match "#anxiety" as a standalone word with a "#" before it like "I have #anxiety", "#anxiety sucks!", or "This is what #anxiety looks like.".

Kevin J
  • 194
  • 1
  • 11
  • 1
    note that mysql switched regexp libraries in msyql 8 and the :<: syntax is no longer supported. – ysth Jul 22 '22 at 16:35
  • what does `select version();` show? – ysth Jul 22 '22 at 16:40
  • should "abc#anxiety" match? – ysth Jul 22 '22 at 17:17
  • @ysth the version is "5.6.10-log" and "abc#anxiety" should not be a match only "#anxiety" as a stanalone word with a "#" before it like "I have #anxiety", "#anxiety sucks!", "This is what #anxiety looks like." – Kevin J Aug 07 '22 at 18:08
  • note that that version of mysql is no longer supported (as soon will be true of 5.7 also) and so is now considered insecure. upgrade as soon as you can. – ysth Aug 07 '22 at 21:29

1 Answers1

3

I assume that you work on MySQL 5.7 as otherwise your first query would not return matches either. See Regexp compatibility in the MySQL 8 documentation.

Having said that, your second query does not return matches because # is a non-alphanumerical character while the [[:<:]] pattern requires the character that follows to be alphanumerical. So this will never match. In fact, the mere presence of the # prefix already assures that this "anxiety" is not a part of a word match at the left side, so you should just do:

SELECT * FROM `messages` 
WHERE LOWER(`messages`.`content`) REGEXP '#anxiety[[:>:]]'

In a comment you say that abc#anxiety should not match (even though technically # already breaks a word). In that case do:

SELECT * FROM `messages` 
WHERE LOWER(`messages`.`content`) REGEXP '(^|[^a-zA-Z0-9_])#anxiety[[:>:]]'

In the character class [^a-zA-Z0-9_] add any other character that you wouldn't allow to precede #anxiety.

If your purpose is to find the word with an optional # in front of it, then use the previous regex with an additional ?:

SELECT * FROM `messages` 
WHERE LOWER(`messages`.`content`) REGEXP '(^|[^a-zA-Z0-9_])#?anxiety[[:>:]]'
trincot
  • 317,000
  • 35
  • 244
  • 286