-2

I need help coming up with a PCRE regular expression that will find two occurrences of a word without a certain word between them in a multiline file.

The first example below should be matched because "foo" occurs twice without the word "bar" between them, while the second example should not be matched because "foo" occurs twice with the word "bar" between them.

Matching Example:

foo
jumped
over
the
lazy
foo

Nonmatching Example:

foo
jumped
over
the
bar
foo

dave_erie
  • 197
  • 2
  • 8
  • 1
    Might be a duplicate of [Regular expressions: Ensuring b doesn't come between a and c](https://stackoverflow.com/questions/37240408) – Ryszard Czech Sep 03 '20 at 20:33

1 Answers1

2

Your question may have been asked before, though perhaps not exactly in the same way. One approach here is to use a tempered dot when matching between occurrences of foo, to ensure that we don't cross over bar along the way. Try running the following regex with dot all mode enabled:

\bfoo\b((?!\bbar\b).)*?\bfoo\b

Demo

Here is an explanation of the regex pattern used:

\bfoo\b           match an initial "foo"
((?!\bbar\b).)*?  match any content, including across newlines, provided
                  that we do not pass "bar" along the way
\bfoo\b           match a final "foo"

If you don't have dot all mode available, you may simulate it using a similar pattern:

\bfoo\b((?!\bbar\b)[\s\S])*?\bfoo\b
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks for the answer. I think I found a similar working expression: "foo(?:(?!bar).)*?foo" but the word boundary assertion is a good addition. – dave_erie Sep 03 '20 at 15:50
  • @dave_erie The main difference between what you pasted above and my answer is that your version is not using word boundaries. There could be sample data for which your pattern might have a false positive. – Tim Biegeleisen Sep 03 '20 at 15:52