Optimal way to create a histogram/frequency distribution in Oracle?

21,205

Solution 1

If your createtime were a date column, this would be trivial:

SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*) 
  FROM EVENTS
 GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24');

As it is, casting the createtime column isn't too hard:

select TO_CHAR( 
         TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 
         'DAY:HH24') AS BUCKET, COUNT(*)
   FROM EVENTS
  WHERE createtime between 1305504000000 and 1306108800000
 group by TO_CHAR( 
         TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 
         'DAY:HH24') 
 order by 1

If, alternatively, you're looking for the fencepost values (for example, where do I go from the first decile (0-10%) to the next (11-20%), you'd do something like:

select min(createtime) over (partition by decile) as decile_start,
       max(createtime) over (partition by decile) as decile_end,
       decile
  from (select createtime, 
               ntile (10) over (order by createtime asc) as decile
          from events
         where createtime between 1305504000000 and 1306108800000
       )

Solution 2

I'm unfamiliar with Oracle's date functions, but I'm pretty certain there's an equivalent way of writing this Postgres statement:

select date_trunc('hour', stamp), count(*)
from your_data
group by date_trunc('hour', stamp)
order by date_trunc('hour', stamp)

Solution 3

Pretty much the same response as Adam, but I would prefer to keep the period_start as a time field so it is easier to filter further if needed:

with
events as
(
    select rownum eventkey, round(dbms_random.value(1305504000000, 1306108800000)) createtime
    from dual
    connect by level <= 1000 
)
select
    trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') period_start,
    count(*) numevents
from
    events
where
    createtime between 1305504000000 and 1306108800000
group by
    trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH')
order by
    period_start
Share:
21,205
matt b
Author by

matt b

Hello, world!

Updated on November 25, 2020

Comments

  • matt b
    matt b over 3 years

    I have an events table with two columns eventkey (unique, primary-key) and createtime, which stores the creation time of the event as the number of milliseconds since Jan 1 1970 in a NUMBER column.

    I would like to create a "histogram" or frequency distribution that shows me how many events were created in each hour of the past week.

    Is this the best way to write such a query in Oracle, using the width_bucket() function? Is it possible to derive the number of rows that fall into each bucket using one of the other Oracle analytic functions rather than using width_bucket to determine what bucket number each row belongs to and doing a count(*) over that?

    -- 1305504000000 = 5/16/2011 12:00am GMT
    -- 1306108800000 = 5/23/2011 12:00am GMT
    select 
    timestamp '1970-01-01 00:00:00' + numtodsinterval((1305504000000/1000 + (bucket * 60 * 60)), 'second') period_start,
    numevents
    from (
      select bucket, count(*) as events from (
        select eventkey, createtime, 
        width_bucket(createtime, 1305504000000, 1306108800000, 24 * 7) bucket
        from events 
        where createtime between 1305504000000 and 1306108800000
      ) group by bucket
    ) 
    order by period_start
    
  • matt b
    matt b about 13 years
    Can you explain the purpose of the with events as () portion, and why you are selecting random values? I'm not very familiar with Oracle syntax
  • Craig
    Craig about 13 years
    sorry... Since I don't have your table of data to run the query against, I am generating random data to emulate what might be in your table. The "with events" statement just allows me to alias that query as "events" so the rest of the query will match what you could use directly against your events table without making any changes. For your purposes, just delete everything above "select trunc(...."
  • matt b
    matt b about 13 years
    ah thanks, I see how that would be useful in this type of answer :)
  • matt b
    matt b about 13 years
    this works well, thanks. Not sure why I didn't think of simply truncating the dates in the first place, I think I got so caught up in figuring out how to parse and cast this odd "date" format
  • Brett VanderVeen
    Brett VanderVeen over 9 years
    Is there anyway to maintain rows for create_times that have a zero count?
  • George L
    George L over 8 years
    Works perfectly in PG! Really fast too.