How to compare two columns to get duplicates in python

Question

I have a df with two columns and I need to find and store only duplicates.

|-------------------|-------------|
|      col1         |    col2     |
|-------------------|-------------|
|     apple         |  mango      |
|-------------------|-------------|
|     banana        |  grape      |
|-------------------|-------------|
|     pear          |  watermelon |
|-------------------|-------------|
|     cherry        |  banana     |
|-------------------|-------------|
|     mango         |  apple      |
|-------------------|-------------|

The result should return a df with col1 like this

    |----------------|
    |   col1         |   
    |----------------|
    |   apple        |        
    |----------------|
    |   banana       |   
    |----------------|
    |   mango        |        
    |----------------|

I tried something like this, but it doesnt fetch me the same resuts.

df['a_flag'] = df['col2'].isin(df['col1']).astype(int)

df1=df[(df['a_flag']==1)]

Does this answer your question? [Find intersection of two columns in Python Pandas -> list of strings](https://stackoverflow.com/questions/49796271/find-intersection-of-two-columns-in-python-pandas-list-of-strings) — Tomerikoo, Oct 13 '20 at 14:54

score 2 · Accepted Answer · answered Oct 13 '20 at 15:01

2

You can use loc to also pass the column name:

df.loc[df['col2'].isin(df['col1']), ['col1']]

Output:

     col1
0   apple
3  cherry
4   mango

answered Oct 13 '20 at 15:01

Quang Hoang

146,074
10
56
74

this is nice if there is only one column to compare, I guess if there is many you could `melt/stack` then use `isin` – Umar.H Oct 13 '20 at 15:04

score 0 · Answer 2 · answered Oct 15 '20 at 20:03

0

Thank you for the response. However I tried using sets and even that seemed to work. Not sure which is more efficient though

Here is the code that worked:

lst1=list(df['col1'])
lst2=list(df['col2'])
lst3=list(set(lst1) & set(lst2))

answered Oct 15 '20 at 20:03

lea

143
2
15

How to compare two columns to get duplicates in python

2 Answers2