0

I have three external tables in HIVE:

table 1:

CREATE EXTERNAL TABLE IF NOT EXISTS table_1(
unique_key_column_1 VARCHAR,
column_needed_1 DATE,   
redundant_column_1 VARCHAR,
redundant_column_2 VARCHAR,
redundant_column_3 VARCHAR,
column_needed_2 TIMESTAMP,
redundant_column_4 VARCHAR,
redundant_column_5 VARCHAR,
column_needed_3 INT,
redundant_column_6 VARCHAR,
redundant_column_7 VARCHAR)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',’
STORED AS TEXTFILE location '/user/<username>/visdata';

table 2:

CREATE EXTERNAL TABLE IF NOT EXISTS table_2(
unique_key_column_1 VARCHAR,
column_needed_4 VARCHAR,
column_needed_5 VARCHAR,
unique_key_column_2 VARCHAR,
redundant_column_1 VARCHAR,
redundant_column_2 VARCHAR,
redundant_column_3 VARCHAR,
column_needed_6 TINYINT,
redundant_column_4 VARCHAR,
redundant_column_5 VARCHAR,
column_needed_7 DATE,
redundant_column_6 VARCHAR,
redundant_column_7 VARCHAR)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',’
STORED AS TEXTFILE location '/user/<username>/visdata';

table 3:

CREATE EXTERNAL TABLE IF NOT EXISTS table_3(
unique_key_column_2 VARCHAR,
redundant_column_1 VARCHAR,
redundant_column_2 VARCHAR,
redundant_column_3 VARCHAR,
redundant_column_4 VARCHAR,
redundant_column_5 VARCHAR,
column_needed_8 VARCHAR,
column_needed_9 TINYINT,
redundant_column_6 VARCHAR,
redundant_column_7 VARCHAR,
column_needed_10 TIMESTAMP)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',’
STORED AS TEXTFILE location '/user/<username>/visdata';

I now want to make a managed table, with left outer joining above tables on my two unique key columns, like this:

unique_key_column_1 column_needed_1 column_needed_2 column_needed_3 column_needed_4 column_needed_5 column_needed_1 column_needed_6 column_needed_7 unique_key_column_2 column_needed_8 column_needed_9 column_needed_10
key_entry_1_1 entry_1_1 entry_1_2 entry_1_3 entry_1_4 entry_1_5 entry_1_6 entry_1_7 key_entry_1_2 entry_1_8 entry_1_9 entry_1_10
key_entry_2_2 entry_2_1 entry_2_2 entry_2_3 entry_2_4 entry_2_5 entry_2_6 entry_2_7 key_entry_2_2 entry_2_8 entry_2_9 entry_2_10

How do I do this?

EDIT 1:
Here's what i could come up with, to join from two tables. I still couldn't figure out how three tables can be joined to make a single table:

> create table combined_table;
> insert into combined_table SELECT * FROM (SELECT r.unique_key_column_1, r.column_needed_1, r.column_needed_2, r.column_needed_3, o.r.column_needed_4, o.column_needed_5, o.column_needed_6, o.column_needed_7 FROM table_1 LEFT OUTER JOIN table_2 o ON (r.unique_key_column_1 = o.unique_key_column_2 );

EDIT 2:
I just realised that joins are expensive. So, is there any I can do this using partitions?

Naveen Reddy Marthala
  • 2,622
  • 4
  • 35
  • 67

2 Answers2

1

@NaveenKumar The solution here is to write the schema for the combinedTable you want. Then insert the results from the 3 tables into the final table.

INSERT INTO combinedTable [SELECT JOIN QUERY HERE]
steven-matison
  • 1,554
  • 1
  • 9
  • 12
  • > create table combined_table; > insert into combined_table SELECT * FROM (SELECT r.unique_key_column_1, r.column_needed_1, r.column_needed_2, r.column_needed_3, o.r.column_needed_4, o.column_needed_5, o.column_needed_6, o.column_needed_7 FROM table_1 LEFT OUTER JOIN table_2 o ON (r.unique_key_column_1 = o.unique_key_column_2 ); – Naveen Reddy Marthala Apr 24 '20 at 16:33
  • This is what i came up with, to combine data of two tables, but i didn't know how to make one table out of three tables. @steven-dfheinz – Naveen Reddy Marthala Apr 24 '20 at 16:35
  • You will need to do the create statement with the schema (from the original tables) for only the columns you want in the final table. So you make 4 total tables. The 4th one is usually NOT external (remove EXTERNAL + location). The first 3 tables become staging tables, and your final 4rth table is the combined results you really want... – steven-matison Apr 24 '20 at 17:50
  • I couldn't understand you much. Could you please write some example syntax. – Naveen Reddy Marthala Apr 25 '20 at 01:43
1

You can create combined table by left joining all three tables. Check below query.

Creating table & inserting data.

CREATE TABLE IF NOT EXISTS COMBINED_TABLE AS 
SELECT
   UNIQUE_KEY_COLUMN_1,
   TBLA.COLUMN_NEEDED_1,
   TBLA.COLUMN_NEEDED_2,
   TBLA.COLUMN_NEEDED_3,
   TBLB.COLUMN_NEEDED_4,
   TBLB.COLUMN_NEEDED_5,
   TBLB.COLUMN_NEEDED_6,
   TBLB.COLUMN_NEEDED_7,
   TBLC.UNIQUE_KEY_COLUMN_2,
   TBLC.COLUMN_NEEDED_8,
   TBLC.COLUMN_NEEDED_9,
   TBLC.COLUMN_NEEDED_10,
FROM
   TABLE_1 TBLA 
   LEFT JOIN
      TABLE_2 TBLB 
      ON TBLA.UNIQUE_KEY_COLUMN_1 = TBLB.UNIQUE_KEY_COLUMN_1 
   LEFT JOIN
      TABLE_3 TBLC 
      ON TBLC.UNIQUE_KEY_COLUMN_2 = TBLB.UNIQUE_KEY_COLUMN_1;

Inserting data into table if target table already created.

INSERT INTO COMBINED_TABLE
SELECT
   UNIQUE_KEY_COLUMN_1,
   TBLA.COLUMN_NEEDED_1,
   TBLA.COLUMN_NEEDED_2,
   TBLA.COLUMN_NEEDED_3,
   TBLB.COLUMN_NEEDED_4,
   TBLB.COLUMN_NEEDED_5,
   TBLB.COLUMN_NEEDED_6,
   TBLB.COLUMN_NEEDED_7,
   TBLC.UNIQUE_KEY_COLUMN_2,
   TBLC.COLUMN_NEEDED_8,
   TBLC.COLUMN_NEEDED_9,
   TBLC.COLUMN_NEEDED_10,
FROM
   TABLE_1 TBLA 
   LEFT JOIN
      TABLE_2 TBLB 
      ON TBLA.UNIQUE_KEY_COLUMN_1 = TBLB.UNIQUE_KEY_COLUMN_1 
   LEFT JOIN
      TABLE_3 TBLC 
      ON TBLC.UNIQUE_KEY_COLUMN_2 = TBLB.UNIQUE_KEY_COLUMN_1;

Srinivas
  • 8,957
  • 2
  • 12
  • 26
  • thank you so much for this. i figured this out shortly after first edit in my question and realised that joins are expensive. will you be able to help me do this using partitions. – Naveen Reddy Marthala Apr 25 '20 at 04:56