How to group dataframe by hour using timestamp with Pandas
Solution 1
I came across this gem, pd.DataFrame.resample
, after I posted my round-to-hour solution.
# Construct example dataframe
times = pd.date_range('1/1/2018', periods=5, freq='25min')
values = [4,8,3,4,1]
df = pd.DataFrame({'val':values}, index=times)
# Resample by hour and calculate medians
df.resample('H').median()
Or you can use groupby
with Grouper
if you don't want times as index:
df = pd.DataFrame({'val':values, 'times':times})
df.groupby(pd.Grouper(level='times', freq='H')).median()
Solution 2
Did you try creating an hour column by:
data_frame['hour'] = data_frame.date.dt.hour
Then grouping by hour like:
data = data.groupby(data.hour).mean()
Solution 3
You can round the timestamp column down to the nearest hour:
import math
df.time = [math.floor(t/3600) * 3600 for t in df.time]
Or even simpler, using integer division:
df.time = [(t//3600) * 3600 for t in df.time]
You can group by this column and thus preserve the timestamp.
Franco
Updated on June 23, 2022Comments
-
Franco almost 2 years
I have the following dataframe structure that is indexed with a timestamp:
neg neu norm pol pos date time 1520353341 0.000 1.000 0.0000 0.000000 0.000 1520353342 0.121 0.879 -0.2960 0.347851 0.000 1520353342 0.217 0.783 -0.6124 0.465833 0.000
I create a date from the timestamp:
data_frame['date'] = [datetime.datetime.fromtimestamp(d) for d in data_frame.time]
Result:
neg neu norm pol pos date time 1520353341 0.000 1.000 0.0000 0.000000 0.000 2018-03-06 10:22:21 1520353342 0.121 0.879 -0.2960 0.347851 0.000 2018-03-06 10:22:22 1520353342 0.217 0.783 -0.6124 0.465833 0.000 2018-03-06 10:22:22
I want to group by hour, while getting the mean for all the values, except the timestamp, that should be the hour from where the group started. So this is the result I want to archive:
neg neu norm pol pos time 1520352000 0.027989 0.893233 0.122535 0.221079 0.078779 1520355600 0.028861 0.899321 0.103698 0.209353 0.071811
The closest I have gotten so far has been with this answer:
data = data.groupby(data.date.dt.hour).mean()
Results:
neg neu norm pol pos date 0 0.027989 0.893233 0.122535 0.221079 0.078779 1 0.028861 0.899321 0.103698 0.209353 0.071811
But I cant figure out how to keep the timestamp that takes in account he hour where the grouby started.
-
Franco about 6 yearsYes, that gives me the same result I have right now. The problem is keeping/generating the timestamp for the beginning of the hour.
-
Franco about 6 yearsHow I didn't thought about this? This works perfectly, such a simple and elegant solution. Thanks!
-
smerllo over 4 yearsVery neat answer