1

I'm new in python and I'd like to separate datas in one column which includes film name with release year to multiple columns, so I found split function.

Data is organized as Title (Year).

What I tried in python was:

movies['title'].str.split('(', 1, expand = True)

Exception happened for those cases below:

City of Lost Children, The (Cité des enfants perdus, La) (1999)

City of Lost Children, The. Cité des enfants perdus, La) (1999)

What I had expected was only 1999) goes to the second column.

I need your help!

Community
  • 1
  • 1
vivace
  • 57
  • 1
  • 6
  • 1
    Please share a sample of the dataframe and the expected output – yatu Jul 09 '19 at 09:58
  • 1
    Possible duplicate of [Splitting on last delimiter in Python string?](https://stackoverflow.com/questions/15012228/splitting-on-last-delimiter-in-python-string) – remeus Jul 09 '19 at 09:59

2 Answers2

3

I vote for using re.findall here with the pattern (.*?) \((\d{4})\):

input = """City of Lost Children, The (Cité des enfants perdus, La) (1999)
           City of Lost Children, The. Cité des enfants perdus, La) (1999)"""

matches = re.findall(r'\s*(.*?) \((\d{4})\)', input)
print(matches)

This prints:

[('City of Lost Children, The (Cité des enfants perdus, La)', '1999'),
 ('City of Lost Children, The. Cité des enfants perdus, La)', '1999')]
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
3

I would suggest pd.Series.str.rsplit:

Given a series s:

print(s)
0    City of Lost Children, The (Cité des enfants perdus, La) (1999)
1    'City of Lost Children, The. Cité des enfants perdus, La) (1999)'
dtype: object

Use s.str.rsplit('(', 1, expand=True):

                                                   0      1
0  City of Lost Children, The (Cité des enfants p...  1999)
1  City of Lost Children, The. Cité des enfants p...  1999)
Chris
  • 29,127
  • 3
  • 28
  • 51