Appending a new column to existing dataframe Spark scala

Question

So this time I'm having a problem with a.) converting a list of string values to a Spark column, and b.) appending this column to an existing DataFrame. I think the biggest problem is that I don't think I understand the type of data structure that is required for conversion into Spark Column. Anyway, I would like to append the following List of values (subjectIDs) to an existing DF using Scala:

val subjectIDs = List("e03", "a01", "b03", "e01", "c02")

Then when I run this line...:

val addSubjectIDs = udf(() => subjectIDs.toDF())

...I'm getting an error:

java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported
  at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:755)
  at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:693)
  at org.apache.spark.sql.functions$.udf(functions.scala:3176)
  ... 54 elided

Ideally, after correct conversion I would like to do the following:

val dataset = examples.withColumn("subject_id", addSubjectIDs())

And obtain the DF of this shape:

dataset.show

+-------+-----------+
|  score| subject_id|
+-------+-----------+
|   5032|        e03|
|   1959|        a01|
|   5629|        b03|
|   5666|        e01|
|   9325|        c02|
+-------+-----------+

Any help would be appreciated.

You need to do a join here on your new dataset? What is the rule for matching the new columns, if you just want them used in order use the zip function — puhlen, Aug 09 '17 at 16:51
@philantrovert thanks for pointing this out - I couldn't find a related topic. — simtim, Aug 09 '17 at 16:54

Appending a new column to existing dataframe Spark scala

0 Answers0