Get the sysdate -1 in Hive

32,184

Solution 1

Try something like:

select * from table1 
where dt >= from_unixtime(unix_timestamp()-1*60*60*24, 'yyyyMMdd');

This works if you don't mind that hive scans the entire table. from_unixtime is not deterministic, so the query planner in Hive won't optimize for you. For many cases (for example log files), not specifying a deterministic partition key can cause a very large hadoop job to start since it will scan the whole table, not just the rows with the given partition key.

If this matters to you, you can launch hive with an additional option

$ hive -hiveconf date_yesterday=20150331

And in the script or hive terminal use

select * from table1
where dt >= ${hiveconf:date_yesterday};

The name of the variable doesn't matter, nor does the value, you can set them in this case to get the prior date using unix commands. In the specific case of the OP

$ hive -hiveconf date_yesterday=$(date --date yesterday "+%Y%m%d")

Solution 2

In mysql:

select DATE_FORMAT(curdate()-1,'%Y%m%d');

In sqlserver :

SELECT convert(varchar,getDate()-1,112)

Use this query:

SELECT FROM_UNIXTIME(UNIX_TIMESTAMP()-1*24*60*60,'%Y%m%d');

Solution 3

It looks like DATE_SUB assumes date in format yyyy-MM-dd. So you might have to do some more format manipulation to get to your format. Try this:

select * from table1 
where dt =  FROM_UNIXTIME(
                UNIX_TIMESTAMP(
                    DATE_SUB(
                        FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')
                    , 1)
                )
            , 'yyyyMMdd')     limit 10;
Share:
32,184
arsenal
Author by

arsenal

profile for ferhan on Stack Exchange, a network of free, community-driven Q&A sites http://stackexchange.com/users/flair/335839.png

Updated on August 21, 2022

Comments

  • arsenal
    arsenal over 1 year

    Is there any way to get the current date -1 in Hive means yesterdays date always? And in this format- 20120805?

    I can run my query like this to get the data for yesterday's date as today is Aug 6th-

    select * from table1 where dt = '20120805';
    

    But when I tried doing this way with date_sub function to get the yesterday's date as the below table is partitioned on date(dt) column.

    select * from table1 where dt = date_sub(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(),
    'yyyyMMdd')) , 1)     limit 10;
    

    It is looking for the data in all the partitions? Why? Something wrong I am doing in my query?

    How I can make the evaluation happen in a subquery to avoid the whole table scanned?