2

For a line input "Abcd abcd1a 5ever qw3-fne superb5 1234 0"

I am trying to match words having letters and numbers, like "Abcd","abcd1a","5ever", "superb5","qw3","fne". But it should not match words having only numbers, like "1234", "0".

Words are separated by all the characters other than above alphanumerics.

I tried this regex (?![0-9])([A-Za-z0-9]+) which fails to match the word "5ever" but works properly for everything else.

How do I write this regex so that it also matches the word "5ever" in full?

2 Answers2

4

Option 1 - Negative lookahead

See regex in use here

\b(?!\d+\b)[^\W_]+
\b(?!\d+\b)[A-Za-z\d]+
\b(?!\d+\b)[a-z\d]+         # With case-insensitive flag enabled
  • \b Assert position as a word boundary
  • (?!\d+\b) Negative lookahead ensuring the whole word isn't made up of only digits
  • [^\W_]+ or [A-Za-z\d]+ Matches only letters or digits one or more times

Option 2 - Without lookahead

Another alternative as seen in use here (case-insensitive i flag enabled):

\b\d*[a-z][a-z\d]*          # With case-insensitive flag enabled
\b\d*[A-Za-z][A-Za-z\d]*
  • \b Assert position as a word boundary
  • \d* Match any digit any number of times
  • [a-z] Match any letter (with i flag enabled this also matches A-Z)
  • [a-z\d]* Match any letter or digit any number of times

Matches the following from the string Abcd abcd1a 5ever qw3-fne superb5 1234 0:

Abcd
abcd1a
5ever
qw3
fne
superb5
ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • On further testing your second option, I found that it does not match strings like _abcdefg. This is because \b is a word boundary and thus it also matches the underscore character. The following regex works perfectly [\d\-']*[A-Za-z][A-Za-z\d\-']* . There is no need for the \b – Saurabh Misra Apr 11 '18 at 20:20
  • @SaurabhMisra from my understanding you only wanted to match full words. In the case of `_abcdefg` are you trying to match `abcdefg` or `_abcdefg`? Also, which regex engine or programming language are you using? – ctwheels Apr 11 '18 at 20:23
  • It also does not match words containing hyphen i.e. `self-employed` – MaxZoom Apr 11 '18 at 20:41
  • @MaxZoom it's not supposed to if you look at the output presented by the OP. – ctwheels Apr 11 '18 at 20:47
  • @ctwheels You are right, strange that he wants it like that, even so it is one word in a dictionary – MaxZoom Apr 11 '18 at 20:53
  • Agreed, but we don’t know the OP’s use case. The OP could be parsing a string of non-dictionary words – ctwheels Apr 11 '18 at 20:56
0

I came up with the following regex:

/\d*[a-z_]+\w*/ig
  • \d*        starts with possible digit(s)
  • [a-z_]+ contains letter or underscore in qty one and more
  • \w*        possibly followed by any characters after that letter
  • ig          case insensitive and global flags

DEMO with detailed explanation

MaxZoom
  • 7,619
  • 5
  • 28
  • 44