Is it possible to find the longest substring that is at the same time a prefix and a suffix (overlapping is allowed) using only one regular expression? For example:
aab
is the longest one in aabaaabaaaaab
:
aabaaabaaaaab
aabaaabaaaaab
aabaaabaaaaab
Another example is babab
being the longest in babababab
:
babababab
babababab
babababab
I was trying to accomplish that using the regular expression suggested by @nhahtdh in Lookahead regex failing to find the same overlapping matches but it fails to find the first example.
I wrote a code that is working properly but it is way too slow because so many uses of text.find
. The reason why I have to resort to all that extra logic is because the r'(?=((\w+).*\2.*\2))'
regular expression is not tuned to find only suffix and prefixes. I have tried to use ^
and $
but I was not able to make it work properly.
Here is the current correct (but slow) implementation:
from re import findall
text = "xyzxyzxyzxyzxyz"
matches = findall(r'(?=((\w+).*\2.*\2))', text)
s = ""
result = ""
for (s1, s2) in matches:
if text.startswith(s1) and text.endswith(s1) and text.startswith(s2) and text.endswith(s2):
if len(s1) == len(text):
s = s2
elif len(s2) == len(text):
s = s1
else:
s = s1 if len(s1) > len(s2) else s2
mid = text.find(s, 1)
if not (mid == -1 or mid >= len(text) - len(s)):
result = s if len(s) > len(result) else result
print(result) # xyzxyzxyz