1

Is it possible to fetch multiple records from Hive table ? for ex: I have 1000 records in the Hive/file table and I don't want to hit the Hive/file for 1000 times due to the performance constraint. I know this technique is available in DB2 but not sure in Hive. Or If we have better option to hit database/file in limited times to fetch all rows in Hive please let me know.

user4157124
  • 2,809
  • 13
  • 27
  • 42
Sekhar
  • 627
  • 4
  • 14
  • 34

1 Answers1

1

select * from table will read files only once. Splittable big files will be splitted and mappers will read it's own part of the file. If you are running 1000 selects instead of one, each select will read datafiles.

1000 records is not too big amount of data and you can use fetch-only task to fetch data fast without running map-reduce at all.

set hive.fetch.task.conversion=more;
set hive.fetch.task.conversion.threshold=1073741824; --1G

See also: Why is Fetch task in Hive works faster than Map-only task?

Also if you are doing some complex processing and fetch-only task does not work due to limitations, then multi-row fetch is much similar to the Vectorization feature.

Vectorization allows Hive to process a batch of rows together instead of processing one row at a time. Each batch is usually an array of primitive types. Operations are performed on the entire column vector, which improves the instruction pipelines and cache usage.

 SET hive.vectorized.execution.enabled=true;
 SET hive.vectorized.execution.reduce.enabled=true;
leftjoin
  • 36,950
  • 8
  • 57
  • 116