Compare specific rows of DataFrames in Scala

Question

I have two Scala DataFrames which I am testing for similarities. I want to be able to pick a specific row number, and compare each value of that row between the two DataFrames. For example:

Dataframe 1: df1

+------+-----+-----------+
| Name | Age | Eye Color |
+------+-----+-----------+
| Bob  | 12  |   Blue    |
| Bil  | 17  |   Red     |
| Ron  | 13  |   Brown   |
+------+-----+-----------+

Dataframe 2: df2

+------+-----+-----------+
| Name | Age | Eye Color |
+------+-----+-----------+
| Bob  | 12  |   Blue    |
| Bil  | 14  |   Blue    |
| Ron  | 13  |   Brown   |
+------+-----+-----------+

Input: Row 2, output: Age, Eye Color.

What would be ideal, is for the output to show the values that are different too. I have considered the option here but the issue is that my DataFrames are very large (in excess of 200,000 rows) so this takes far too long. Is there a simpler way to select a specific row value of a Dataframe in Scala?

The outcome in the sample you have given compares two rows based on **Name** property. Is that what you want to do? Or you strictly want to give your program a row number? — jrook, Oct 22 '20 at 16:44
`zipWithIndex` is the only way you can get continuous incrementing values across 2 different DFs. It should have worked though as it is parallelised. — Sanket9394, Oct 22 '20 at 17:03
Secondly, your usecase of comparing 2 rows of 2 different dataframes makes sense, only if you are `sorting` both dataframes first by some common column. — Sanket9394, Oct 22 '20 at 17:04
@jrook I want to strictly give the program a row number as I need to compare all fields in that row — David Boulton, Oct 23 '20 at 08:47
@Sanket9394 Both databases are sorted and should be identical so that shouldn't be an issue. I will try using zipWithIndex and see how long it takes. Thanks — David Boulton, Oct 23 '20 at 08:48
@DavidBoulton , Databases are sorted means? df1 and df2 are from database ? — Sanket9394, Oct 23 '20 at 12:00

Compare specific rows of DataFrames in Scala

0 Answers0