So this time I'm having a problem with a.) converting a list of string values to a Spark column, and b.) appending this column to an existing DataFrame. I think the biggest problem is that I don't think I understand the type of data structure that is required for conversion into Spark Column. Anyway, I would like to append the following List of values (subjectIDs
) to an existing DF using Scala:
val subjectIDs = List("e03", "a01", "b03", "e01", "c02")
Then when I run this line...:
val addSubjectIDs = udf(() => subjectIDs.toDF())
...I'm getting an error:
java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:755)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:693)
at org.apache.spark.sql.functions$.udf(functions.scala:3176)
... 54 elided
Ideally, after correct conversion I would like to do the following:
val dataset = examples.withColumn("subject_id", addSubjectIDs())
And obtain the DF of this shape:
dataset.show
+-------+-----------+
| score| subject_id|
+-------+-----------+
| 5032| e03|
| 1959| a01|
| 5629| b03|
| 5666| e01|
| 9325| c02|
+-------+-----------+
Any help would be appreciated.