Pandas df.describe() - how do I extract values into Dataframe?

10,438

Solution 1

Please try something like this:

df.describe(include='all').loc['mean']

Solution 2

You were close. You don't need any include tag. Just rewrite your second approach correctly: df.describe()['mean']

For example:

import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])
s.describe()['mean']
# 3.0

If you want both mean and std, just write df.describe()[['mean', 'std']]. For example,

s.describe()[['mean', 'std']]
# mean    3.000000
# std     1.581139
# dtype: float64

Solution 3

If you further want to extract specific column data then try:

df.describe()['FeatureName']['mean']

Replace mean with any other statistic you want to extract

Share:
10,438
Vaslo
Author by

Vaslo

A finance guy trying to learn coding

Updated on June 24, 2022

Comments

  • Vaslo
    Vaslo almost 2 years

    I am trying to do a naive Bayes and after loading some data into a dataframe in Pandas, the describe function captures the data I want. I'd like to capture the mean and std from each column of the table but am unsure on how to do that. I've tried things like:

    df.describe([mean])
    df.describe(['mean'])
    df.describe().mean
    

    None are working. I was able to do something similar in R with summary but don't know how to do in Python. Can someone lend some advice?

  • Vaslo
    Vaslo over 5 years
    Works like a charm. Looks like I can capture it as a variable as well. What if you want two items like mean and std?
  • Vaslo
    Vaslo over 5 years
    I'm getting an error that says: KeyError: "['mean' 'std'] not in index". Any idea why that would occur?
  • Sheldore
    Sheldore over 5 years
    @Vaslo: You missed a comma between 'mean' and 'std'. Try again with a comma
  • Sheldore
    Sheldore over 5 years
    If still the problem persists, please include some dataframe in your question
  • Vaslo
    Vaslo over 5 years
    I think the issue is that I am trying to use on a 2D frame. When I use your example it works fine but when I try to do it exactly cut and paste as you explain it is giving me an error.
  • Sheldore
    Sheldore over 5 years
    @Vaslo: Can you try df_1 = pd.Series(df.values.ravel()) and then try df_1.describe()[['mean', 'std']]?
  • Vaslo
    Vaslo over 5 years
    When i use the example you posted it works and gives me a single mean and single std. Mine should have a column of means and stds (one for each column)
  • milos.ai
    milos.ai over 5 years
    df.describe(include='all').loc[['mean','std']]
  • Sheldore
    Sheldore over 5 years
    Try what I wrote in my comment before uploading the dataframe and see if it works
  • Vaslo
    Vaslo over 5 years
    I tried the df_1 = pd.Series(df.values.ravel()) and it works, but returns just a single std and mean, so maybe that is my issue? I have a 767x9 dataframe that I am trying to extract 9 means from.
  • Sheldore
    Sheldore over 5 years
    Ok, so it's more complicated then I thought. You can then accept the other solution above if it works for you
  • Vaslo
    Vaslo over 5 years
    Many thanks - the solution below by @milos.ai gives me a subset of the describe dataframe. For future reference, how do I upload a frame, or do I just type it into the space?
  • Sheldore
    Sheldore over 5 years
    You can copy paste the frame after printing it using df.head() for example. People can then simply copy your dataframe and then use pd.read_clipboard() to create a dataframe out of it. To get more idea, just click on the pandas or dataframe tag below your question and then see how other question have done it.
  • Sundeep
    Sundeep over 4 years
    I had to use python 2.7 to use some other libraries at work and I had to use include='all' to get it working.