1

It is an example I extract from my database. I am working with the visualization in co authorship, so based on this sample, I have to just keep one relationship in two authors. such as I have to delete one of Brian Norton---Maria Roo Ons or Maria Roo Ons---Brian Norton to keep the uniqueness of relationship.

-------------------------------------------------------------------------------------------------
|              article_title                                | author_name     |   coauthor_name |
-------------------------------------------------------------------------------------------------
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons   | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons   | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons   | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons   | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann      | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann      | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann      | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann      | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu        | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu        | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu        | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu        | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Brian Norton
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Sarah McCormack | S. Shynu
-------------------------------------------------------------------------------------------------

the ideal final output is below.

-------------------------------------------------------------------------------------------------
|              article_title                                | author_name     |   coauthor_name |
-------------------------------------------------------------------------------------------------
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | Maria Roo Ons
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Brian Norton    | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons   | Max Ammann
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons   | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Maria Roo Ons   | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann      | S. Shynu
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | Max Ammann      | Sarah McCormack
A Metal Plate Solar Antenna for UMTS Pico-cell Base Station | S. Shynu        | Sarah McCormack

In this situation, I just want to keep one row. How do I deal with it in R or Python? Thanks a lot for your help.

skybunk
  • 833
  • 2
  • 12
  • 17
Dan Xu
  • 11
  • 2

2 Answers2

1

I assume that you have a separate database and are using python to connect with it.

possible approaches:

1) You can add row-numbers based on the article column and then preform a de-duplication. you may check out this answer, for how to go about it in SQL.

Then you can just run the query using your python - db connector

2) You may pull the records into a pandas dataframe and do the analysis there. Pandas is good with handling and manipulating data.

skybunk
  • 833
  • 2
  • 12
  • 17
0

I am assuming your dataframe looks like the one I have shown below since you have not shared what other possibilities could arise.

article author1 author2
A       a       b
A       b       a
A       a       a
A       b       b

In R, this is how I can get the rows you are looking for. I have assumed your dataframe is df1.

# This will create a new dataframe df2 with only those rows where author1 and author2 are different

df2 <- df1[df1$author1 != df1$author2, ]

The output looks like what you have provided in the question.

article author1 author2
  A       a       b
  A       b       a

Let me know if this is what you needed.

Code_Sipra
  • 1,571
  • 4
  • 19
  • 38
  • Thanks for your answer, but my problem is not delete repeating columns, the problem is how to delete duplicate if author1 and author2 have the same value but different order with same article. – Dan Xu Nov 23 '17 at 23:27
  • When you say same value, do you mean both `author1` and `author2` should be `a`? – Code_Sipra Nov 24 '17 at 00:07
  • no, it means in the same article, I just need one record like: article A author1a author2b or article A author1b author2a – Dan Xu Nov 24 '17 at 13:51
  • See my updated answer. Since you have not provided a bigger dataset for me to fully test my code, I have assumed that the `author1` and `author2` will contain only `a` or `b`. Let me know if this solves it for you. – Code_Sipra Nov 24 '17 at 19:40
  • Thanks so much for your help. – Dan Xu Nov 25 '17 at 13:18
  • Does this solve your problem? If not would be glad to work over it with you! – Code_Sipra Nov 25 '17 at 13:23
  • Hi, thanks so much, and sorry for not familiar with this website, and I put an example below, could you please check, and I think its more clear. – Dan Xu Nov 25 '17 at 13:26
  • Saw the example. Can you also provide a screenshot of how you want the final answer to be? – Code_Sipra Nov 25 '17 at 13:30
  • I just post it below as an answer. – Dan Xu Nov 25 '17 at 13:32
  • I do not see any duplicates in your example under `author_name` and `coauthor_name`. It would really help how you want the final output to be by providing a screenshot. – Code_Sipra Nov 25 '17 at 13:35
  • I posted the final output. – Dan Xu Nov 25 '17 at 13:41
  • I request you to edit your question with the example and final desired output rather than adding an answer. – Code_Sipra Nov 25 '17 at 14:08
  • That's a useful suggestion, I will modify it now. – Dan Xu Nov 25 '17 at 14:15