pandas daily average, pandas.resample

12,593

You need convert string columns to float first:

#add parameter parse_dates for convert to datetime first column
df=pd.read_csv('data.csv', index_col=0, parse_dates=[0])

df['Temp1'] = df.Temp1.astype(float)
df['Temp2'] = df.Temp2.astype(float)

df_avg = df.resample('D').mean()

If astype return error, problem is there are some non numeric values. So you need use to_numeric with errors='coerce' - then all 'problematic' values are converted to NaN:

df['Temp1'] = pd.to_numeric(df.Temp1, errors='coerce')
df['Temp2'] = pd.to_numeric(df.Temp2, errors='coerce')

You can also check all rows with problematic values with boolean indexing:

print df[pd.to_numeric(df.Temp1, errors='coerce').isnull()]
print df[pd.to_numeric(df.Temp2, errors='coerce').isnull()]
Share:
12,593
OGMGIC
Author by

OGMGIC

Updated on June 05, 2022

Comments

  • OGMGIC
    OGMGIC almost 2 years

    I have a csv file similar to this

    Date,Temp1,Temp2
    
    23-Oct-09 01:00:00,21.1,22.3
    
    23-Oct-09 04:00:00,22.3,23.8
    
    23-Oct-09 07:00:00,21.4,21.3
    
    23-Oct-09 10:00:00,21.5,21.6
    
    23-Oct-09 13:00:00,22.3,23.8
    
    23-Oct-09 16:00:00,21.4,21.3
    
    23-Oct-09 19:00:00,21.1,22.3
    
    23-Oct-09 22:00:00,21.4,21.3
    
    24-Oct-09 01:00:00,22.3,23.8
    
    24-Oct-09 04:00:00,22.3,23.8
    
    24-Oct-09 07:00:00,21.1,22.3
    
    24-Oct-09 10:00:00,22.3,23.8
    
    24-Oct-09 13:00:00,21.1,22.3
    
    24-Oct-09 16:00:00,22.3,23.8
    
    24-Oct-09 19:00:00,21.1,22.3
    
    24-Oct-09 22:00:00,22.3,23.8
    

    I have read the data with:

    df=pd.read_csv('data.csv', index_col=0)
    

    and converted the index to date time

    df.index=pd.to_datetime(df.index)
    

    Now I want to take the mean of each daily temperature, I have been trying to use pd.resample as below, but have been receiving errors. I've read the pandas.resample docs and numerous examples on here and am still at a loss...

    df_avg = df.resample('D', how = 'mean')
    

    DataError: No numeric types to aggregate

    I would like df_avg to be a dataframe with a datetime index and the two 2 columns. I am using pandas 0.17.1 and python 3.5.2, any help greatly appreciated!