Trying to optimize some code that reuses a matched group, I was wondering whether accessing Match.group()
is expensive. I tried to dig in re.py's source, but the code was a bit cryptic.
A few tests seem to indicate that it might be better to store the output of Match.group()
in a variable, but I would like to understand what exactly happens when Match.group()
is called, and if there is another internal way to maybe access the content of the group directly.
Some example code to illustrate a potential use:
import re
m = re.search('X+', f'__{"X"*10000}__')
# do something
# m.group()
# do something else
# m.group()
Timings
direct access:
%%timeit
len(m.group())
220 ns ± 1.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
intermediate variable:
X = m.group()
%%timeit
len(X)
# 51 ns ± 0.172 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
References:
current re.py code (python 3.10)
current sre_compile.py code (python 3.10)
removing the effect of attribute access (doesn't change much)
G = m.group
%%timeit
len(G())
230 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)