I am trying to match the following using regex in python (re
module):
"...milk..." => matched ['milk']
"...almondmilk..." = no match
"...almond milk..." = no match
"...almond word(s) milk..." => matched ['milk']
"...almondword(s)milk..." => matched ['milk']
"...soymilk..." = no match
"...soy milk..." = no match
"...soy word(s) milk..." => matched ['milk']
"...soyword(s)milk..." => matched ['milk']
My other requirement is to find all matches within a given string. So I am using re.findall()
I used the answer to this question (and reviewed a number of other SO pages) to construct my regex:
regx = '^(?!.*(soy|almond))(?=$|.*(milk)).*'
but when I test it with a simple example, I get incorrect behavior:
>>> food = "is combined with creamy soy and milk. a fruity and refreshing sip of spring, "
>>> re.findall(regx, food)
[]
>>> food = "is combined with creamy milk. a fruity and refreshing sip of spring, "
>>> re.findall(regx, food)
[('', 'milk')]
Both of these are supposed to return just ['milk']
. Also, if I have multiple instances of milk, I only get one result instead of two:
>>> food = "is combined with creamy milk. a fruity and refreshing sip of milk, "
>>> re.findall(regx, food)
[('', 'milk')]
What am I doing wrong in my regex, and how should I adjust it to solve this problem?