Counting frequency of values by date using pandas

40,784

Solution 1

It might be easiest to turn your Series into a DataFrame and use Pandas' groupby functionality (if you already have a DataFrame then skip straight to adding another column below).

If your Series is called s, then turn it into a DataFrame like so:

>>> df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})
>>> df
       Category           Timestamp
0      Facebook 2014-10-16 15:05:17
1         Vimeo 2014-10-16 14:56:37
2      Facebook 2014-10-16 14:25:16
...

Now add another column for the week and year (one way is to use apply and generate a string of the week/year numbers):

>>> df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))
>>> df
             Timestamp     Category Week/Year
0  2014-10-16 15:05:17     Facebook   42/2014
1  2014-10-16 14:56:37        Vimeo   42/2014
2  2014-10-16 14:25:16     Facebook   42/2014
...

Finally, group by 'Week/Year' and 'Category' and aggregate with size() to get the counts. For the data in your question this produces the following:

>>> df.groupby(['Week/Year', 'Category']).size()
Week/Year  Category   
41/2014    DailyMotion    1
           Facebook       3
           Vimeo          2
           Youtube        3
42/2014    Facebook       7
           Orkut          1
           Vimeo          1

Solution 2

Convert your TimeStamp column to week number then groupby that week number and value_count the categorical variable like so:

df.groupby('week_num').Category.value_counts()

Where I have assumed that a new column week_num was created from the TimeStamp column.

Solution 3

To be a little bit more clear, you do not need to create a new column called 'week_num' first.

df.groupby(by=lambda x: "%d/%d" % (x.week(), x.year())).Category.value_counts()

The function by will automatically call on each timestamp object of the index to convert them to week and year, and then group by the week and year.

Share:
40,784
jcborges
Author by

jcborges

Karlsruhe Institute of Technology (KIT) Pervasive Computing Systems - TECO Julio De Melo Borges, MSc. Research Scientist and PhD Student Data Scientist @ SDIL - Smart Data Innovation Lab post: KIT, TECO, Vincenz-Prießnitz-Str. 1, 76131 Karlsruhe, Germany (Room 214) fon: +49 721 608-41708 , fax: +49 721 608-41702 email: [email protected], web: www.teco.kit.edu/people/borges

Updated on September 17, 2020

Comments

  • jcborges
    jcborges over 3 years

    Let's suppose I have following Time Series:

    Timestamp              Category
    2014-10-16 15:05:17    Facebook
    2014-10-16 14:56:37    Vimeo
    2014-10-16 14:25:16    Facebook
    2014-10-16 14:15:32    Facebook
    2014-10-16 13:41:01    Facebook
    2014-10-16 12:50:30    Orkut
    2014-10-16 12:28:54    Facebook
    2014-10-16 12:26:56    Facebook
    2014-10-16 12:25:12    Facebook
    ...
    2014-10-08 15:52:49    Youtube
    2014-10-08 15:04:50    Youtube
    2014-10-08 15:03:48    Vimeo
    2014-10-08 15:02:27    Youtube
    2014-10-08 15:01:56    DailyMotion
    2014-10-08 13:27:28    Facebook
    2014-10-08 13:01:08    Vimeo
    2014-10-08 12:52:06    Facebook
    2014-10-08 12:43:27    Facebook
    Name: summary, Length: 600
    

    I would like to make a count of each category (Unique Value/Factor in the Time Series) per week and year.

    Example:
    
        Week/Year      Category      Count
        1/2014         Facebook      12
        1/2014         Google        5
        1/2014         Youtube       2
    ...    
        2/2014         Facebook      2
        2/2014         Google        5
        2/2014         Youtube       20
    ...
    

    How can this be achieved using Python pandas?