0

I use Pandas first time. There is a regex rule that extract mm, yy from string 0122 or 012022 or 01/22:

def extractMMYY(str):
    pattern = r"^((?:0[1-9]|1[0-2])|[0-9])[.\/\\-]*([0-9]{3}[0-9]|[0-9]{2})$"
    match = re.search(pattern, str)
    mm = None
    yy = None

    if match:
        mm = match.group(1)
        yy = match.group(2)

        if mm and yy:
            return mm, yy
    return mm, yy

I have tried to apply this function for specific column and get a new dataframe:

df_filtered = df[df['column_name'].apply(extractMMYY)];

As result I need to create a two additional columns: MM, YY with values from extractMMYY.

How to do that?

Oliver
  • 5
  • 2

1 Answers1

0

You can try

df = pd.DataFrame({'column_name': {0: '0122', 1: '012022', 2: '01/22', 3: '9922', 4: '03/23'}})

df_filtered = pd.DataFrame(df['column_name'].apply(extractMMYY).tolist(), columns=['MM', 'YY'])
print(df_filtered)

     MM    YY
0    01    22
1    01  2022
2    01    22
3  None  None
4    03    23
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • Thank you I will, if mm empty it still creates a new column mm? – Oliver May 23 '22 at 16:46
  • @Oliver Yes, make sure you always return two values in `extractMMYY` method. – Ynjxsjmh May 23 '22 at 16:48
  • I got this error: `--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance) 3620 try: -> 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err:` – Oliver May 23 '22 at 16:50
  • @Oliver Could you provide the sample data that causes the error? I couldn't reproduce this error with mm and yy both are `None` – Ynjxsjmh May 23 '22 at 16:52
  • Column with name "column_name" has 03/23 value – Oliver May 23 '22 at 16:54
  • @Oliver I updated the answer and couldn't reproduce the error. – Ynjxsjmh May 23 '22 at 16:55
  • @Oliver Yes, your `extractMMYY` method uses `re.match` – Ynjxsjmh May 23 '22 at 17:00
  • I have column with name "1" when I try to use it: ----> 1 df_filtered = pd.DataFrame(df['1'].apply(extrac.. KeyError Traceback (most recent call last) – Oliver May 23 '22 at 17:02
  • df.columns give me: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype='int64') – Oliver May 23 '22 at 17:04
  • I got it should be int 1 instead string, thank you a lot! Do you know how to filter inital dataset by new data set by mm field? – Oliver May 23 '22 at 17:08
  • Is it possible to add columns mm, yy to current dataset? – Oliver May 23 '22 at 17:13
  • @Oliver To add it into current dataset, you can check https://stackoverflow.com/questions/72176826/returning-multiple-variables-with-pandas-series-apply/72176901#72176901. – Ynjxsjmh May 23 '22 at 17:27