2

Looking for an expression to extract City Names from addresses. Trying to use this expression in WebHarvy which uses the .NET flavor of regex

Example address

1234 Savoy Dr Ste 123
New Houston, TX 77036-3320

or

1234 Savoy Dr Ste 510
Texas, TX 77036-3320

So the city name could be single or two words.

The expression I am trying is

(\w|\w\s\w)+(?=,\s\w{2})

When I am trying this on RegexStorm it seems to be working fine, but when I am using this in WebHarvy, it only captures the 'n' from the city name New Houston and 'n' from Austin

Where am I going wrong?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
blackystrat
  • 109
  • 1
  • 5

1 Answers1

2

In WebHarvey, if a regex contains a capturing group, its contents are returned. Thus, you do not need a lookahead.

Another point is that you need to match 1 or more word chars, optionally followed with a chunk of whitespaces followed with 1 or more word chars. Your regex contains a repeated capturing group whose contents are re-written upon each iteration and after it finds matching, Group 1 only contains n:

enter image description here

Use

(\w+(?:[^\S\r\n]+\w+)?),\s\w{2})

See the regex demo here

The [^\S\r\n]+ part matches any whitespace except CR and LF. You may use [\p{Zs}\t]+ to match any 1+ horizontal whitespaces.

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563