0

I have a df with two columns and I need to find and store only duplicates.

|-------------------|-------------|
|      col1         |    col2     |
|-------------------|-------------|
|     apple         |  mango      |
|-------------------|-------------|
|     banana        |  grape      |
|-------------------|-------------|
|     pear          |  watermelon |
|-------------------|-------------|
|     cherry        |  banana     |
|-------------------|-------------|
|     mango         |  apple      |
|-------------------|-------------|

The result should return a df with col1 like this

    |----------------|
    |   col1         |   
    |----------------|
    |   apple        |        
    |----------------|
    |   banana       |   
    |----------------|
    |   mango        |        
    |----------------|

I tried something like this, but it doesnt fetch me the same resuts.

df['a_flag'] = df['col2'].isin(df['col1']).astype(int)

df1=df[(df['a_flag']==1)]
jdehesa
  • 58,456
  • 7
  • 77
  • 121
lea
  • 143
  • 2
  • 15
  • Does this answer your question? [Find intersection of two columns in Python Pandas -> list of strings](https://stackoverflow.com/questions/49796271/find-intersection-of-two-columns-in-python-pandas-list-of-strings) – Tomerikoo Oct 13 '20 at 14:54
  • @Tomerikoo No that is not what I am looking for – lea Oct 13 '20 at 15:11

2 Answers2

2

You can use loc to also pass the column name:

df.loc[df['col2'].isin(df['col1']), ['col1']]

Output:

     col1
0   apple
3  cherry
4   mango
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • this is nice if there is only one column to compare, I guess if there is many you could `melt/stack` then use `isin` – Umar.H Oct 13 '20 at 15:04
0

Thank you for the response. However I tried using sets and even that seemed to work. Not sure which is more efficient though

Here is the code that worked:

lst1=list(df['col1'])
lst2=list(df['col2'])
lst3=list(set(lst1) & set(lst2))
lea
  • 143
  • 2
  • 15