1

I am using spark-sql-2.4.1v to streaming in my PoC. I am trying to do join by registering dataframes as table.

For which I am using createGlobalTempView and doing as below

 first_df.createGlobalTempView("first_tab");
 second_df.createGlobalTempView("second_tab");

Dataset<Row> joinUpdatedRecordsDs =  sparkSession.sql("select a.* , b.create_date, b.last_update_date from first_tab as a "
        + " inner join second_tab as b "
        + " on  a.company_id = b.company_id  "
        );

ERROR org.apache.spark.sql.AnalysisException: Table or view not found: first_tab; line 1 pos 105

What wrong I am doing here ? how to fix this ?

Some more info

On my spark session I ".enableHiveSupport()" set.

When I see logs I found these traces

19/09/13 12:40:45 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_table : db=default tbl=first_tab
19/09/13 12:40:45 INFO audit: ugi=userrw    ip=unknown-ip-addr  cmd=get_table : db=default tbl=first_tab    
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_table : db=default tbl=second_tab
19/09/13 12:40:45 INFO audit: ugi=userrw    ip=unknown-ip-addr  cmd=get_table : db=default tbl=second_tab   
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_database: default
19/09/13 12:40:45 INFO audit: ugi=userrw    ip=unknown-ip-addr  cmd=get_database: default   
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_database: default
19/09/13 12:40:45 INFO audit: ugi=userrw    ip=unknown-ip-addr  cmd=get_database: default   
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_tables: db=default pat=*
19/09/13 12:40:45 INFO audit: ugi=userrw    ip=unknown-ip-addr  cmd=get_tables: db=default pat=*    
System.out.println("first_tab exists : " + sparkSession.catalog().tableExists("first_tab"));
 System.out.println("second_tab exists : " + sparkSession.catalog().tableExists("second_tab"));

Output

first_tab exists : false
    second_tab exists : false

I tried to print the tables in the db as below but nothing prints.

 sparkSession.catalog().listTables().foreach( tab -> {
            System.out.println("tab.database :" + tab.database());
            System.out.println("tab.name :" + tab.name());
            System.out.println("tab.tableType :" + tab.tableType());
        });

No output printed , therefore we may say no table created.

I tried to create tables with "global_temp." but throws error

org.apache.spark.sql.AnalysisException: It is not allowed to add database prefix `global_temp` for the TEMPORARY view name.;
    at org.apache.spark.sql.execution.command.CreateViewCommand.<init>(views.scala:122)

I tried to refer table with appending "global_temp." but throws same above error

i.e

System.out.println("first_tab exists : " + sparkSession.catalog().tableExists("global_temp.first_tab"));
 System.out.println("second_tab exists : " + sparkSession.catalog().tableExists("global_temp.second_tab"));

same above error

BdEngineer
  • 2,929
  • 4
  • 49
  • 85

2 Answers2

3

These global views live in the database with the name global_temp so i would recommend to reference the tables in your queries as global_temp.table_name. I am not sure if it solves your problem, but you can try it.

From the Spark source code:

Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database global_temp, and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.

David Vrba
  • 2,984
  • 12
  • 16
  • @BdLearner I would be concerned with this warning from your log `WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException`. It appears that it can not access the global_temp database. What happens if you replace 'createGlobalTempView' by `createOrReplaceTempView`? This function should create the view in the `default` database. – David Vrba Sep 14 '19 at 04:34
  • can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename – BdEngineer May 27 '20 at 06:35
1

Remove .enableHiveSupport() while creating a session. This would work fine.

SparkSession spark = SparkSession
        .builder()
        .appName("DatabaseMigrationUtility")
        //.enableHiveSupport()
        .getOrCreate();

As David mentioned use global_temp. to refer the table.

BdEngineer
  • 2,929
  • 4
  • 49
  • 85