Optimal way to create a histogram/frequency distribution in Oracle?
Solution 1
If your createtime
were a date column, this would be trivial:
SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*)
FROM EVENTS
GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24');
As it is, casting the createtime
column isn't too hard:
select TO_CHAR(
TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000),
'DAY:HH24') AS BUCKET, COUNT(*)
FROM EVENTS
WHERE createtime between 1305504000000 and 1306108800000
group by TO_CHAR(
TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000),
'DAY:HH24')
order by 1
If, alternatively, you're looking for the fencepost values (for example, where do I go from the first decile (0-10%) to the next (11-20%), you'd do something like:
select min(createtime) over (partition by decile) as decile_start,
max(createtime) over (partition by decile) as decile_end,
decile
from (select createtime,
ntile (10) over (order by createtime asc) as decile
from events
where createtime between 1305504000000 and 1306108800000
)
Solution 2
I'm unfamiliar with Oracle's date functions, but I'm pretty certain there's an equivalent way of writing this Postgres statement:
select date_trunc('hour', stamp), count(*)
from your_data
group by date_trunc('hour', stamp)
order by date_trunc('hour', stamp)
Solution 3
Pretty much the same response as Adam, but I would prefer to keep the period_start as a time field so it is easier to filter further if needed:
with
events as
(
select rownum eventkey, round(dbms_random.value(1305504000000, 1306108800000)) createtime
from dual
connect by level <= 1000
)
select
trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') period_start,
count(*) numevents
from
events
where
createtime between 1305504000000 and 1306108800000
group by
trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH')
order by
period_start
Comments
-
matt b over 3 years
I have an
events
table with two columnseventkey
(unique, primary-key) andcreatetime
, which stores the creation time of the event as the number of milliseconds since Jan 1 1970 in aNUMBER
column.I would like to create a "histogram" or frequency distribution that shows me how many events were created in each hour of the past week.
Is this the best way to write such a query in Oracle, using the
width_bucket()
function? Is it possible to derive the number of rows that fall into each bucket using one of the other Oracle analytic functions rather than usingwidth_bucket
to determine what bucket number each row belongs to and doing acount(*)
over that?-- 1305504000000 = 5/16/2011 12:00am GMT -- 1306108800000 = 5/23/2011 12:00am GMT select timestamp '1970-01-01 00:00:00' + numtodsinterval((1305504000000/1000 + (bucket * 60 * 60)), 'second') period_start, numevents from ( select bucket, count(*) as events from ( select eventkey, createtime, width_bucket(createtime, 1305504000000, 1306108800000, 24 * 7) bucket from events where createtime between 1305504000000 and 1306108800000 ) group by bucket ) order by period_start
-
matt b about 13 yearsCan you explain the purpose of the
with events as ()
portion, and why you are selecting random values? I'm not very familiar with Oracle syntax -
Craig about 13 yearssorry... Since I don't have your table of data to run the query against, I am generating random data to emulate what might be in your table. The "with events" statement just allows me to alias that query as "events" so the rest of the query will match what you could use directly against your events table without making any changes. For your purposes, just delete everything above "select trunc(...."
-
matt b about 13 yearsah thanks, I see how that would be useful in this type of answer :)
-
matt b about 13 yearsthis works well, thanks. Not sure why I didn't think of simply truncating the dates in the first place, I think I got so caught up in figuring out how to parse and cast this odd "date" format
-
Brett VanderVeen over 9 yearsIs there anyway to maintain rows for create_times that have a zero count?
-
George L over 8 yearsWorks perfectly in PG! Really fast too.