Python Pandas: Groupby date, and accessing each group by timestamp
17,193
Here's your frame
In [40]: df = pd.DataFrame({'DATE' : ['10-Oct-2013', '10-Oct-2013', '10-Oct-2013', '11-Oct-2013', '11-Oct-2013', '11-Oct-2013'],'VAL' : [1,2,3,4,5,6]})
Much faster to directly convert a date-like column
In [41]: df['DATE']= pd.to_datetime(df['DATE'])
In [42]: df.dtypes
Out[42]:
DATE datetime64[ns]
VAL int64
dtype: object
In [43]: df
Out[43]:
DATE VAL
0 2013-10-10 00:00:00 1
1 2013-10-10 00:00:00 2
2 2013-10-10 00:00:00 3
3 2013-10-11 00:00:00 4
4 2013-10-11 00:00:00 5
5 2013-10-11 00:00:00 6
This accomplishes what it loooks like you want
In [44]: df.groupby('DATE').describe()
Out[44]:
VAL
DATE
2013-10-10 count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
2013-10-11 count 3.0
mean 5.0
std 1.0
min 4.0
25% 4.5
50% 5.0
75% 5.5
max 6.0
If you REALLY want to get by a group individually
In [45]: g = df.groupby('DATE')
In [46]: key = g.groups.keys()[0]
In [47]: key
Out[47]: numpy.datetime64('2013-10-09T20:00:00.000000000-0400')
In [48]: g.get_group(key.astype('i8'))
Out[48]:
DATE VAL
0 2013-10-10 00:00:00 1
1 2013-10-10 00:00:00 2
2 2013-10-10 00:00:00 3
datetime64[ns] are stored internally as long integers, so that's how they need to be accessed You normally really have no reason to do this as you can just
df.groupby('DATE').apply(lambda x: .....)
or if you really want to iterate
for g, grp in df.groupby('DATE'):
......
Related videos on Youtube
Author by
notilas
Updated on June 17, 2022Comments
-
notilas almost 2 years
I want to groupby timestamp (date) and access each group by timestamp, which looks not working properly. It looks like the group keys are strangely indexed with different formats.
df= pd.DataFrame({'DATE' : ['10-Oct-2013', '10-Oct-2013', '10-Oct-2013', '11-Oct-2013', '11-Oct-2013', '11-Oct-2013'],'VAL' : [1,2,3,4,5,6]}) >>> df DATE VAL 0 10-Oct-2013 1 1 10-Oct-2013 2 2 10-Oct-2013 3 3 11-Oct-2013 4 4 11-Oct-2013 5 5 11-Oct-2013 6 dfg=df.groupby(df['DATE'].apply(lambda x: pd.to_datetime(x))) >>> dfg.groups.keys() [numpy.datetime64('NaT'), numpy.datetime64('2013-10-10T17:00:00.000000000-0700'), numpy.datetime64('2013-10-09T17:00:00.000000000-0700')] for d in dfg.groups.keys(): try: print d,dfg.get_group(d).describe() except: print 'err' >> NaT err 2013-10-10T17:00:00.000000000-0700 err 2013-10-09T17:00:00.000000000-0700 err rng = pd.to_datetime(pd.date_range('10/10/2013', periods=3, freq='D')) for d in rng: try: print d,dfg.get_group(d).describe() except: print 'err' 2013-10-10 00:00:00 err 2013-10-11 00:00:00 err 2013-10-12 00:00:00 err
-
notilas over 10 yearsThanks for the quick answer. But I got the folloiwng error message: <br/> Traceback (most recent call last):<br/> File "<stdin>", line 1, in <module><br/> File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 278, in get_group<br/> inds = self.indices[name]<br/> KeyError: 1381449600000000000<br/>
-
notilas over 10 yearsI have tried it on 2 different machines, win32 and win64, but both does not work. Python 2.7 and Pandas 0.12. Is this a windows problem?
-
Jeff over 10 yearsits partially a windows problem, try this:
g.get_group(long(key.astype('i8')))
. The astypeing returns anp.int64
, which for some reason doesn't compare properly on windows to along
. -
Jeff over 10 yearshere's an issue to handle this in this better. github.com/pydata/pandas/issues/5267