Pandas df.describe() - how do I extract values into Dataframe?
Solution 1
Please try something like this:
df.describe(include='all').loc['mean']
Solution 2
You were close. You don't need any include
tag. Just rewrite your second approach correctly: df.describe()['mean']
For example:
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])
s.describe()['mean']
# 3.0
If you want both mean
and std
, just write df.describe()[['mean', 'std']]
. For example,
s.describe()[['mean', 'std']]
# mean 3.000000
# std 1.581139
# dtype: float64
Solution 3
If you further want to extract specific column data then try:
df.describe()['FeatureName']['mean']
Replace mean with any other statistic you want to extract
Comments
-
Vaslo almost 2 years
I am trying to do a naive Bayes and after loading some data into a dataframe in Pandas, the describe function captures the data I want. I'd like to capture the mean and std from each column of the table but am unsure on how to do that. I've tried things like:
df.describe([mean]) df.describe(['mean']) df.describe().mean
None are working. I was able to do something similar in R with summary but don't know how to do in Python. Can someone lend some advice?
-
Vaslo over 5 yearsWorks like a charm. Looks like I can capture it as a variable as well. What if you want two items like mean and std?
-
Vaslo over 5 yearsI'm getting an error that says: KeyError: "['mean' 'std'] not in index". Any idea why that would occur?
-
Sheldore over 5 years@Vaslo: You missed a comma between 'mean' and 'std'. Try again with a comma
-
Sheldore over 5 yearsIf still the problem persists, please include some dataframe in your question
-
Vaslo over 5 yearsI think the issue is that I am trying to use on a 2D frame. When I use your example it works fine but when I try to do it exactly cut and paste as you explain it is giving me an error.
-
Sheldore over 5 years@Vaslo: Can you try
df_1 = pd.Series(df.values.ravel())
and then trydf_1.describe()[['mean', 'std']]
? -
Vaslo over 5 yearsWhen i use the example you posted it works and gives me a single mean and single std. Mine should have a column of means and stds (one for each column)
-
milos.ai over 5 yearsdf.describe(include='all').loc[['mean','std']]
-
Sheldore over 5 yearsTry what I wrote in my comment before uploading the dataframe and see if it works
-
Vaslo over 5 yearsI tried the df_1 = pd.Series(df.values.ravel()) and it works, but returns just a single std and mean, so maybe that is my issue? I have a 767x9 dataframe that I am trying to extract 9 means from.
-
Sheldore over 5 yearsOk, so it's more complicated then I thought. You can then accept the other solution above if it works for you
-
Vaslo over 5 yearsMany thanks - the solution below by @milos.ai gives me a subset of the describe dataframe. For future reference, how do I upload a frame, or do I just type it into the space?
-
Sheldore over 5 yearsYou can copy paste the frame after printing it using
df.head()
for example. People can then simply copy your dataframe and then usepd.read_clipboard()
to create a dataframe out of it. To get more idea, just click on the pandas or dataframe tag below your question and then see how other question have done it. -
Sundeep over 4 yearsI had to use python 2.7 to use some other libraries at work and I had to use
include='all'
to get it working.