query hive partitioned table over date/time range

13,425

Solution 1

This is a similar scenario we face everyday while querying tables in hive. We have partitioned our tables similar to the way you explained and it has helped a lot if querying. This is how we partition:

CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year bigint, month bigint, day bigint, hour int) 
STORED AS TEXTFILE;

For partitions we assign values like this:

year = 2014, month = 201409, day = 20140924, hour = 01

This way the querying becomes really simple and you can directly query:

select * from table1 where day >= 20140527 and day < 20140605 

Hope this helps

Solution 2

you can query like this

  WHERE st_date > '2014-05-27-00' and end_date < '2014-06-05-24' 

should give you desired result because even if it is a sting a it will be compared lexicographically i.e '2014-04-04' will be always greater '2014-04-03'.

I ran it on my sample tables and it works perfectly fine.

Solution 3

You can use CONCAT with LPAD.

Say you want to get all partitions between 2020-03-24, hour=00 to 2020-04-24, hour=23, then, your 'where' condition would look like:

WHERE (CONCAT(year, '-', LPAD(month,2,'0'), '-', LPAD(day,2,'0'), '_', LPAD(hour,2,'0')) > '2020-03-24_00')
AND (CONCAT(year, '-', LPAD(month,2,'0'), '-', LPAD(day,2,'0'), '_', LPAD(hour,2,'0')) < '2020-04-24_23')
Share:
13,425
banjara
Author by

banjara

Updated on June 08, 2022

Comments

  • banjara
    banjara almost 2 years

    My hive table is partitioned on year, month, day, Hour

    Now I want to fetch data from 2014-05-27 to 2014-06-05 How can I do that??

    I know one option is create partition on epoch(or yyyy-mm-dd-hh) and in query pass epoch time. Can I do it without loosing date hierarchy??

    Table Structure

    CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
    PARTITIONED BY (year int, month int, day int, hour int) 
    STORED AS TEXTFILE;
    
  • banjara
    banjara almost 10 years
    thanks for answering. year, month, day, Hour are separate fields. I have added my table structure in question.
  • chhantyal
    chhantyal over 6 years
    If OP can change partitions, I think better approach is to use yy-mm-dd string as partition. With that, you can use complex SQL queries, see more details community.hortonworks.com/questions/29031/…