I'm trying to analyze a large column of data that contains 12 digit numbers as seen below:
0 802112134267
1 300949934377
2 300999934377
3 222589009836
4 950279219923
Name: number, dtype: object
I want to grab any number that has 3 or more repeated characters. Row 2 contains 4 '9's and row 3 contains 3 '2's. I would want to return:
0 None
1 None
2 300999934377
3 222589009836
4 None
Name: number, dtype: object
Or just a truncated/filtered dataframe/series would suffice.
The regex that I think solves this is: '(\d)\1{2,}'
However, I haven't been able to successfully apply this regex to the series.
regex = re.compile('(\d)\1{2,}')
s.apply(lambda x: np.nan if regex.search(x) == None else x)
returns all NaN.
s.str.extract('(\d)\1{2,}', expand=True)
returns all NaN.
s.str.contains('(\d)\1{2,}')
returns all False.
Any help would be appreciated. I've tried searching the forum and haven't found any good examples that have worked.
Thanks