I am using spark-sql-2.4.1v to streaming in my PoC. I am trying to do join by registering dataframes as table.
For which I am using createGlobalTempView and doing as below
first_df.createGlobalTempView("first_tab");
second_df.createGlobalTempView("second_tab");
Dataset<Row> joinUpdatedRecordsDs = sparkSession.sql("select a.* , b.create_date, b.last_update_date from first_tab as a "
+ " inner join second_tab as b "
+ " on a.company_id = b.company_id "
);
ERROR org.apache.spark.sql.AnalysisException: Table or view not found: first_tab; line 1 pos 105
What wrong I am doing here ? how to fix this ?
Some more info
On my spark session I ".enableHiveSupport()" set.
When I see logs I found these traces
19/09/13 12:40:45 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_table : db=default tbl=first_tab
19/09/13 12:40:45 INFO audit: ugi=userrw ip=unknown-ip-addr cmd=get_table : db=default tbl=first_tab
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_table : db=default tbl=second_tab
19/09/13 12:40:45 INFO audit: ugi=userrw ip=unknown-ip-addr cmd=get_table : db=default tbl=second_tab
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_database: default
19/09/13 12:40:45 INFO audit: ugi=userrw ip=unknown-ip-addr cmd=get_database: default
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_database: default
19/09/13 12:40:45 INFO audit: ugi=userrw ip=unknown-ip-addr cmd=get_database: default
19/09/13 12:40:45 INFO HiveMetaStore: 0: get_tables: db=default pat=*
19/09/13 12:40:45 INFO audit: ugi=userrw ip=unknown-ip-addr cmd=get_tables: db=default pat=*
System.out.println("first_tab exists : " + sparkSession.catalog().tableExists("first_tab")); System.out.println("second_tab exists : " + sparkSession.catalog().tableExists("second_tab"));
Output
first_tab exists : false
second_tab exists : false
I tried to print the tables in the db as below but nothing prints.
sparkSession.catalog().listTables().foreach( tab -> {
System.out.println("tab.database :" + tab.database());
System.out.println("tab.name :" + tab.name());
System.out.println("tab.tableType :" + tab.tableType());
});
No output printed , therefore we may say no table created.
I tried to create tables with "global_temp." but throws error
org.apache.spark.sql.AnalysisException: It is not allowed to add database prefix `global_temp` for the TEMPORARY view name.;
at org.apache.spark.sql.execution.command.CreateViewCommand.<init>(views.scala:122)
I tried to refer table with appending "global_temp." but throws same above error
i.e
System.out.println("first_tab exists : " + sparkSession.catalog().tableExists("global_temp.first_tab"));
System.out.println("second_tab exists : " + sparkSession.catalog().tableExists("global_temp.second_tab"));
same above error