Python - Aggregate by month and calculate average

python date csv pandas aggregate

24,751

Solution 1

Probably the simplest approach is to use the resample command. First, when you read in your data make sure you parse the dates and set the date column as your index (ignore the StringIO part and the header=True ... I am reading in your sample data from a multi-line string):

>>> df = pd.read_csv(StringIO(data),header=True,parse_dates=['Date'],
                     index_col='Date')
>>> df

            Sentiment
Date
2014-01-03       0.40
2014-01-04      -0.03
2014-01-09       0.00
2014-01-10       0.07
2014-01-12       0.00
2014-02-24       0.00 
2014-02-25       0.00
2014-02-25       0.00
2014-02-26       0.00
2014-02-28       0.00
2014-03-01       0.10
2014-03-02      -0.50
2014-03-03       0.00
2014-03-08      -0.06
2014-03-11      -0.13
2014-03-22       0.00
2014-03-23       0.33
2014-03-23       0.30
2014-03-25      -0.14
2014-03-28      -0.25


>>> df.resample('M').mean()

            Sentiment
2014-01-31      0.088
2014-02-28      0.000
2014-03-31     -0.035

And if you want a month counter, you can add it after your resample:

>>> agg = df.resample('M',how='mean')
>>> agg['cnt'] = range(len(agg))
>>> agg

            Sentiment  cnt
2014-01-31      0.088    0
2014-02-28      0.000    1
2014-03-31     -0.035    2

You can also do this with the groupby method and the TimeGrouper function (group by month and then call the mean convenience method that is available with groupby).

>>> df.groupby(pd.TimeGrouper(freq='M')).mean()

            Sentiment
2014-01-31      0.088
2014-02-28      0.000
2014-03-31     -0.035

Solution 2

To get the monthly average values of a Data Frame when the DataFrame has daily data rows 'Sentiment', I would:

Convert the column with the dates , df['dates'] into the index of the DataFrame df: df.set_index('date',inplace=True)
Then I'll convert the index dates into a month-index: df.index.month
Finally I'll calculate the mean of the DataFrame GROUPED BY MONTH: df.groupby(df.index.month).Sentiment.mean()

I go slowly throw each step here:

Generation DataFrame with dates and values

You need first to import Pandas and Numpy, as well as the module datetime
```
from datetime import datetime
```

Generate a Column 'date' between 1/1/2019 and the 3/05/2019, at week 'W' intervals. And a column 'Sentiment'with random values between 1-100:

date_rng = pd.date_range(start='1/1/2018', end='3/05/2018', freq='W')
df = pd.DataFrame(date_rng, columns=['date'])
df['Sentiment']=np.random.randint(0,100,size=(len(date_rng)))

the df has two columns 'date' and 'Sentiment':

        date  Sentiment
0 2018-01-07         34
1 2018-01-14         32
2 2018-01-21         15
3 2018-01-28          0
4 2018-02-04         95
5 2018-02-11         53
6 2018-02-18          7
7 2018-02-25         35
8 2018-03-04         17

Set `'date'`column as the index of the DataFrame:

df.set_index('date',inplace=True)

df has one column 'Sentiment' and the index is 'date':

            Sentiment
date                 
2018-01-07         34
2018-01-14         32
2018-01-21         15
2018-01-28          0
2018-02-04         95
2018-02-11         53
2018-02-18          7
2018-02-25         35
2018-03-04         17

Capture the month number from the index

    months=df.index.month

Obtain the mean value of each month grouping by month:

    monthly_avg=df.groupby(months).Sentiment.mean()

The mean of the dataset by month `'monthly_avg'` is:

24,751

Author by

Jaroslav Klimčík

Updated on November 21, 2020

Comments

Jaroslav Klimčík over 3 years

I have a csv which looks like this:

Date,Sentiment
2014-01-03,0.4
2014-01-04,-0.03
2014-01-09,0.0
2014-01-10,0.07
2014-01-12,0.0
2014-02-24,0.0
2014-02-25,0.0
2014-02-25,0.0
2014-02-26,0.0
2014-02-28,0.0
2014-03-01,0.1
2014-03-02,-0.5
2014-03-03,0.0
2014-03-08,-0.06
2014-03-11,-0.13
2014-03-22,0.0
2014-03-23,0.33
2014-03-23,0.3
2014-03-25,-0.14
2014-03-28,-0.25
etc

And my goal is to aggregate date by months and calculate average of months. Dates might not start with 1. or January. Problem is that I have a lot of data, that means I have more years. For this purpose I would like to find the soonest date (month) and from there start counting months and their averages. For example:

Month count, average
1, 0.4 (<= the earliest month)
2, -0.3
3, 0.0
...
12, 0.1
13, -0.4 (<= new year but counting of month is continuing)
14, 0.3

I'm using Pandas to open csv

data = pd.read_csv("pks.csv", sep=",")

so in data['Date'] I have dates and in data['Sentiment'] I have values. Any idea how to do it?

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

parse date-time while reading 'csv' file with pandas

Pandas read_csv fills empty values with string 'nan', instead of parsing date

Group By a Column and Sum contents of another column with Python

Print OLS regression summary to text file

How to drop the index column while writing the DataFrame in a .csv file in Pandas?

Engines in Python Pandas read_csv

How to quickly get the last line from a .csv file over a network drive?

pandas read ASCII formatted table

Aggregate all dataframe row pair combinations using pandas

Convert String to Date [With Year and Quarter]