Write a csv to a partitioned Hive table using Spark org.apache.spark.SparkException: Requested partitioning does not match the table

Question

I have an existing Hive table:

CREATE TABLE form_submit (form_id String,
submitter_name String)
PARTITIONED BY
submission_date String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS ORC;

I have a csv of raw data, which I read using

 val session = SparkSession.builder()
      .enableHiveSupport()
      .config("spark.hadoop.hive.exec.dynamic.partition", "true")
      .config("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict")
      .getOrCreate()
 val dataframe = session
      .read
      .option("header", "true")
      .csv(hdfsPath)

I then perform some manipulations on this data, using a series of withColumn and drop statements, to make sure that the format matches the table format.

I then try to write it like so:

formattedDataframe.write
      .mode(SaveMode.Append)
      .format("hive")
      .partitionBy("submission_date")
      .saveAsTable(tableName)

I'm not using insertInto, because the columns in the dataframe end up in a bad order, and I wouldn't want to rely on column order anyway.

And run it as a Spark job. I get an exception:

Exception in thread "main" org.apache.spark.SparkException: Requested partitioning does not match the form_submit table:
Requested partitions:
Table partitions: "submission_date"

What am I doing wrong? Didn't I choose the partitioning by calling partitionedBy?

check the data type of the submission_date in the dataframe and table and see if they are same. If there is difference make them the same datatype. — Nikunj Kakadiya, Feb 16 '21 at 04:52

Write a csv to a partitioned Hive table using Spark org.apache.spark.SparkException: Requested partitioning does not match the table

0 Answers0