1

I'm using pyspark 2.4 and I already enabled the HiveSupport:

spark = SparkSession.builder.appName("spark").enableHiveSupport().getOrCreate()

but when I'm running:

spark.sql("""
CREATE TABLE reporting.sport_ads AS

SELECT 
*
, 'Home' as HomeOrAway
, HomeTeam as TeamName
FROM adwords_ads_brand
UNION
SELECT 
*
, 'Away' as HomeOrAway
, AwayTeam as TeamName
FROM adwords_ads_brand
""")

I hit the error:

pyspark.sql.utils.AnalysisException: "Hive support is required to CREATE Hive TABLE (AS SELECT);;\n'CreateTable `reporting`.`sport_ads`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, ErrorIfExists\n+- Distinct\n   +- Union\n      :-
....

It doesn't make any sense to me, am I doing something wrong?

ps: I have to add that this code works very well in databricks and with Spark with Scala.

Jay Cee
  • 1,855
  • 5
  • 28
  • 48

1 Answers1

2

Check the below config value in your pyspark

>>> spark.sparkContext.getConf().get("spark.sql.catalogImplementation")

if the property value not set to hive.


Try passing the below conf in pyspark shell

--conf spark.sql.catalogImplementation=hive

and run your code again.

UPDATE:

Create a dataframe out of union query:

val df = spark.sql("""SELECT 
*
, 'Home' as HomeOrAway
, HomeTeam as TeamName
FROM adwords_ads_brand
UNION
SELECT 
*
, 'Away' as HomeOrAway
, AwayTeam as TeamName
FROM adwords_ads_brand""")

Then save the dataframe as table using .saveAsTable function

df.format("<parquet,orc..etc>").saveAsTable("<table_name>")
notNull
  • 30,258
  • 4
  • 35
  • 50
  • it says "hive" :( I have a feeling that it's the command "CREATE TABLE" that is odd – Jay Cee Jun 26 '19 at 09:23
  • 1
    @JayCee,I tried similar command on my end and it works fine in `pyspark`. try to create a dataframe out of this `union` and then use **`df.saveAsTable("db.table")`**, https://stackoverflow.com/questions/54967186/cannot-create-table-with-spark-sql-hive-support-is-required-to-create-hive-tab – notNull Jun 26 '19 at 14:17
  • 1
    This worked! Actually it was another table badly encoded (.snappy.parquet was missing) and I applied the .format('parquet').saveAsTable and it worked like a charm. Thanks a lot, feel free to modify your answer for me to validate it. – Jay Cee Jun 27 '19 at 13:41
  • @JayCee, Great.. Please check out the `updated the answer` and validate it..! – notNull Jun 27 '19 at 14:13