0

I am using spark-sql-2.4.1v with java8 version. I have a scenario where I need to copy current row and create another row modifying few columns data how can this be achieved in spark-sql ?

Ex : Given

 val data = List(
  ("20", "score", "school",  14 ,12),
  ("21", "score", "school",  13 , 13),
  ("22", "rate", "school",  11 ,14)
 )
val df = data.toDF("id", "code", "entity", "value1","value2")

Current Output

+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 20|score|school|    14|    12|
| 21|score|school|    13|    13|
| 22| rate|school|    11|    14|
+---+-----+------+------+------+

When column "code" is "rate" copy it as two rows i.e. one is original , second it is another row with new code "old_ rate" like below

Expected output :

+---+--------+------+------+------+
| id|    code|entity|value1|value2|
+---+--------+------+------+------+
| 20|   score|school|    14|    12|
| 21|   score|school|    13|    13|
| 22|    rate|school|    11|    14|
| 22|new_rate|school|    11|    14|
+---+--------+------+------+------+

how to achieve this ?

BdEngineer
  • 2,929
  • 4
  • 49
  • 85

2 Answers2

3

you can use this approach for your scenario,

df.union(df.filter($"code"==="rate").withColumn("code",concat(lit("new_"), $"code"))).show()
/*
+---+--------+------+------+------+
| id|    code|entity|value1|value2|
+---+--------+------+------+------+
| 20|   score|school|    14|    12|
| 21|   score|school|    13|    13|
| 22|    rate|school|    11|    14|
| 22|new_rate|school|    11|    14|
+---+--------+------+------+------+
*/
sathya
  • 1,982
  • 1
  • 20
  • 37
  • how to handle this use-case in spark , any advice please https://stackoverflow.com/questions/63127722/how-to-solve-this-use-case-any-way-we-can-use-array-struct-explode-and-str – BdEngineer Jul 28 '20 at 05:43
  • thanks a lot , i have another use case like this , any advice please https://stackoverflow.com/questions/63137437/doing-multiple-column-value-look-up-after-joining-with-lookup-dataset – BdEngineer Jul 28 '20 at 15:17
  • can you advice me on the same https://stackoverflow.com/questions/63450135/applying-when-condition-only-when-column-exists-in-the-dataframe – BdEngineer Aug 17 '20 at 11:52
  • can you tell me what is wrong with this broadcast variable accessing ? https://stackoverflow.com/questions/64003697/spark-broadcast-variable-map-giving-null-value – BdEngineer Sep 22 '20 at 05:48
2

Use when to check code === rate, if it is matched then replace that column value with array(lit("rate"),lit("new_rate")) & not matched column values array($"code") then explode code column.

Check below code.

scala> df.show(false)
+---+-----+------+------+------+
|id |code |entity|value1|value2|
+---+-----+------+------+------+
|20 |score|school|14    |12    |
|21 |score|school|13    |13    |
|22 |rate |school|11    |14    |
+---+-----+------+------+------+
val colExpr = explode(
    when(
        $"code" === "rate",
        array(
            lit("rate"),
            lit("new_rate")
        )
    )
    .otherwise(array($"code"))
)
scala> df.withColumn("code",colExpr).show(false)
+---+--------+------+------+------+
|id |code    |entity|value1|value2|
+---+--------+------+------+------+
|20 |score   |school|14    |12    |
|21 |score   |school|13    |13    |
|22 |rate    |school|11    |14    |
|22 |new_rate|school|11    |14    |
+---+--------+------+------+------+
Srinivas
  • 8,957
  • 2
  • 12
  • 26
  • 1
    thanks a lot Srini , it worked. Need to convert it into java...thanks a lot.. what does this do ".otherwise(array($"code"))" ?? – BdEngineer Jul 24 '20 at 15:05
  • how to handle this use-case in spark , any advice please https://stackoverflow.com/questions/63127722/how-to-solve-this-use-case-any-way-we-can-use-array-struct-explode-and-str – BdEngineer Jul 28 '20 at 05:43
  • 1
    accepted the answer , my problem is now how to look up value1 value2 columns...stuck there. – BdEngineer Jul 29 '20 at 05:15