Comparing two dataframes df1 (recent data) and df2 (previous Data) which are derived from same table for different timestamps and extract data from df1 based on a column name (id) that are not available in df2
I used row number to extract the recent and previous data and store them in df1(recent data) and df2(previous data). I tried using left join, subtract but i am not sure if i am on the right track.
df1=
ID|Timestamp |RowNum|
+----------+-------------------+
|1|2019-04-03 14:45:...| 1|
|2|2019-04-03 14:45:...| 1|
|3|2019-04-03 14:45:...| 1|
df2 =
ID|Timestamp |RowNum|
+----------+-------------------+
|2|2019-04-03 13:45:...| 2|
|3|2019-04-03 13:45:...| 2|
%%spark
result2 = df1.join(df2.select(['id']), ['id'], how='left')
result2.show(10)
but didn't give the desired output
Required Output:
ID|Timestamp |RowNum|
+----------+-------------------+
|1|2019-04-03 14:45:...| 1|