4

I have a small problem: I have a column in my DataFrame, which has multiple rows, and in each row it holds either 1 or more values starting with 'M' letter followed by 3 digits. If there is more than 1 value, they are separated by a comma. I would like to print out a view of the DataFrame, only featuring rows where that 1 column holds values I specify (e.g. I want them to hold any item from list ['M111', 'M222']. I have started to build my boolean mask in the following way:

df[df['Column'].apply(lambda x: x.split(', ').isin(['M111', 'M222']))]

In my mind, .apply() with .split() methods in there first convert 'Column' values to lists in each row with 1 or more values in it, and then .isin() method confirms whether or not any of items in list of items in each row are in the list of specified values ['M111', 'M222']. In practice however, instead of getting a desired view of DataFrame, I get error

'TypeError: unhashable type: 'list'

What am I doing wrong?

Kind regards, Greem

Greem666
  • 919
  • 13
  • 24
  • you are applying isin method on a list( result on x.split()). you apply isin method on a dataframe or a series object. – plasmon360 May 23 '17 at 05:02

2 Answers2

5

I think you need:

df2 = df[df['Column'].str.contains('|'.join(['M111', 'M222']))]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    For dates and times, I had to call `df['Column'].astype(str).str.contains` in order for it to work. – Dan Salo Nov 05 '18 at 20:46
5

You can only access the isin() method with a Pandas object. But split() returns a list. Wrapping split() in a Series will work:

# sample data
data = {'Column':['M111, M000','M333, M444']}
df = pd.DataFrame(data)

print(df)
       Column
0  M111, M000
1  M333, M444

Now wrap split() in a Series.
Note that isin() will return a list of boolean values, one for each element coming out of split(). You want to know "whether or not any of item in list...are in the list of specified values", so add any() to your apply function.

df[df['Column'].apply(lambda x: pd.Series(x.split(', ')).isin(['M111', 'M222']).any())]

Output:

       Column
0  M111, M000

As others have pointed out, there are simpler ways to go about achieving your end goal. But this is how to resolve the specific issue you're encountering with isin().

andrew_reece
  • 20,390
  • 3
  • 33
  • 58
  • I can see how this is a complicated approach, and the str.contains() method makes a quick and easier work of it. But for the sake of satisfying my curiosity, can you please let me know what is the effect of applying lambda x: pd.Series(x.split(', ') to the 'Column' column? The way I read it, it looks like each row in that column will get converted to a Series object with as many rows of its own, as many values were contained in each string separated by commas? – Greem666 May 23 '17 at 05:15