Get the average date from multiple dates - pandas

10,434

Solution 1

For your example.

Your data:

df = pd.DataFrame(data=[['A', '2018-08-05 17:06:01'],
                   ['A', '2018-08-05 17:06:02'],
                   ['A', '2018-08-05 17:06:03'],
                   ['B', '2018-08-05 17:06:07'],
                   ['B', '2018-08-05 17:06:09'],
                   ['B', '2018-08-05 17:06:11']],
            columns = ['column', 'date'])

Solution:

df.date = pd.to_datetime(df.date).values.astype(np.int64)

df = pd.DataFrame(pd.to_datetime(df.groupby('column').mean().date))

Output:

                      date
column                    
A      2018-08-05 17:06:02
B      2018-08-05 17:06:09

I hope it will be helpful.

Solution 2

Preparing a sample dataframe:

# Initiate dataframe
date_var = "date"
df = pd.DataFrame(data=[['A', '2018-08-05 17:06:01'],
                        ['A', '2018-08-05 17:06:02'],
                        ['A', '2018-08-05 17:06:03'],
                        ['B', '2018-08-05 17:06:07'],
                        ['B', '2018-08-05 17:06:09'],
                        ['B', '2018-08-05 17:06:11']],
                  columns=['column', date_var])

# Convert date-column to proper pandas Datetime-values/pd.Timestamps
df[date_var] = pd.to_datetime(df[date_var])

Extraction of the desired average Timestamp-value:

# Extract the numeric value associated to each timestamp (epoch time)
# NOTE: this is being accomplished via accessing the .value - attribute of each Timestamp in the column
In:
[tsp.value for tsp in df[date_var]]
Out:
[
    1533488761000000000, 1533488762000000000, 1533488763000000000,
    1533488767000000000, 1533488769000000000, 1533488771000000000
]

# Use this to calculate the mean, then convert the result back to a timestamp
In:
pd.Timestamp(np.nanmean([tsp.value for tsp in df[date_var]]))
Out:
Timestamp('2018-08-05 17:06:05.500000')
Share:
10,434
humanlearning
Author by

humanlearning

Updated on August 27, 2022

Comments

  • humanlearning
    humanlearning over 1 year

    DataFrame where Date is datetime:

       Column   |       Date             
    :-----------|----------------------:
        A       |   2018-08-05 17:06:01 
        A       |   2018-08-05 17:06:02 
        A       |   2018-08-05 17:06:03 
        B       |   2018-08-05 17:06:07 
        B       |   2018-08-05 17:06:09 
        B       |   2018-08-05 17:06:11 
    

    Return Table is;

       Column   |       Date            
    :-----------|----------------------:
        A       |   2018-08-05 17:06:02 
        B       |   2018-08-05 17:06:09 
    
    • humanlearning
      humanlearning over 5 years
      I did the same thing with idxmin & idexmax for max and mins. But can't figure out a way to do average.
  • Laurent
    Laurent over 3 years
    I was looking for a way to include a DateTime column in a groupby aggregation. Your solution's the one that fixed fact that pandas originally excludes them.