Python Pandas: Groupby date, and accessing each group by timestamp

17,193

Here's your frame

In [40]: df = pd.DataFrame({'DATE' : ['10-Oct-2013', '10-Oct-2013', '10-Oct-2013', '11-Oct-2013', '11-Oct-2013', '11-Oct-2013'],'VAL' : [1,2,3,4,5,6]})

Much faster to directly convert a date-like column

In [41]: df['DATE']= pd.to_datetime(df['DATE'])

In [42]: df.dtypes
Out[42]: 
DATE    datetime64[ns]
VAL              int64
dtype: object

In [43]: df
Out[43]: 
                 DATE  VAL
0 2013-10-10 00:00:00    1
1 2013-10-10 00:00:00    2
2 2013-10-10 00:00:00    3
3 2013-10-11 00:00:00    4
4 2013-10-11 00:00:00    5
5 2013-10-11 00:00:00    6

This accomplishes what it loooks like you want

In [44]: df.groupby('DATE').describe()
Out[44]: 
                  VAL
DATE                 
2013-10-10 count  3.0
           mean   2.0
           std    1.0
           min    1.0
           25%    1.5
           50%    2.0
           75%    2.5
           max    3.0
2013-10-11 count  3.0
           mean   5.0
           std    1.0
           min    4.0
           25%    4.5
           50%    5.0
           75%    5.5
           max    6.0

If you REALLY want to get by a group individually

In [45]: g = df.groupby('DATE')

In [46]: key = g.groups.keys()[0]

In [47]: key
Out[47]: numpy.datetime64('2013-10-09T20:00:00.000000000-0400')

In [48]: g.get_group(key.astype('i8'))
Out[48]: 
                 DATE  VAL
0 2013-10-10 00:00:00    1
1 2013-10-10 00:00:00    2
2 2013-10-10 00:00:00    3

datetime64[ns] are stored internally as long integers, so that's how they need to be accessed You normally really have no reason to do this as you can just

df.groupby('DATE').apply(lambda x: .....)

or if you really want to iterate

for g, grp in df.groupby('DATE'):
        ......
Share:
17,193

Related videos on Youtube

notilas
Author by

notilas

Updated on June 17, 2022

Comments

  • notilas
    notilas almost 2 years

    I want to groupby timestamp (date) and access each group by timestamp, which looks not working properly. It looks like the group keys are strangely indexed with different formats.

    df= pd.DataFrame({'DATE' : ['10-Oct-2013', '10-Oct-2013', '10-Oct-2013', '11-Oct-2013', '11-Oct-2013', '11-Oct-2013'],'VAL' : [1,2,3,4,5,6]})
    
    >>> df
              DATE  VAL
    0  10-Oct-2013    1
    1  10-Oct-2013    2
    2  10-Oct-2013    3
    3  11-Oct-2013    4
    4  11-Oct-2013    5
    5  11-Oct-2013    6
    
    
    dfg=df.groupby(df['DATE'].apply(lambda x: pd.to_datetime(x)))
    
    >>> dfg.groups.keys()
    [numpy.datetime64('NaT'), numpy.datetime64('2013-10-10T17:00:00.000000000-0700'), numpy.datetime64('2013-10-09T17:00:00.000000000-0700')]
    
    for d in dfg.groups.keys():
        try:
            print d,dfg.get_group(d).describe()
        except:
            print 'err'
    >>
    NaT err
    2013-10-10T17:00:00.000000000-0700 err
    2013-10-09T17:00:00.000000000-0700 err
    
    rng = pd.to_datetime(pd.date_range('10/10/2013', periods=3, freq='D'))
    
    for d in rng:
        try:
            print d,dfg.get_group(d).describe()
        except:
            print 'err'
    
    2013-10-10 00:00:00 err
    2013-10-11 00:00:00 err
    2013-10-12 00:00:00 err
    
  • notilas
    notilas over 10 years
    Thanks for the quick answer. But I got the folloiwng error message: <br/> Traceback (most recent call last):<br/> File "<stdin>", line 1, in <module><br/> File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 278, in get_group<br/> inds = self.indices[name]<br/> KeyError: 1381449600000000000<br/>
  • notilas
    notilas over 10 years
    I have tried it on 2 different machines, win32 and win64, but both does not work. Python 2.7 and Pandas 0.12. Is this a windows problem?
  • Jeff
    Jeff over 10 years
    its partially a windows problem, try this: g.get_group(long(key.astype('i8'))). The astypeing returns a np.int64, which for some reason doesn't compare properly on windows to a long.
  • Jeff
    Jeff over 10 years
    here's an issue to handle this in this better. github.com/pydata/pandas/issues/5267