Pandas python .describe() formatting/output
Solution 1
One way to do this would be to first do .reset_index()
, to reset the index for your temp
DataFrame, and then use DataFrame.pivot
as you want . Example -
In [24]: df = pd.read_csv(io.StringIO("""name,prop
....: A,1
....: A,2
....: B, 4
....: A, 3
....: B, 5
....: B, 2"""))
In [25]: temp = df.groupby('name')['prop'].describe().reset_index()
In [26]: newdf = temp.pivot(index='name',columns='level_1',values=0)
In [27]: newdf.columns.name = '' #This is needed so that the name of the columns is not `'level_1'` .
In [28]: newdf
Out[28]:
25% 50% 75% count max mean min std
name
A 1.5 2 2.5 3 3 2.000000 1 1.000000
B 3.0 4 4.5 3 5 3.666667 2 1.527525
Then you can save this newdf
to csv.
Solution 2
You can achieve that by running below code :
from pandas import *
data = read_csv('testProp.csv')
data.describe().T
Solution 3
In pandas v0.22, you can use the unstack feature. Building on from @Kumar answer above, you can use the pandas stack/unstack feature and play around with it's variation.
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO("""name,prop
A,1
A,2
B, 4
A, 3
B, 5
B, 2"""))
df.shape
df
temp = df.groupby(['name'])['prop'].describe()
temp
temp.stack() #unstack(),unstack(level=-1) level can be -1, 0
Check out the documentation pandas unstack for more details
Mike
Updated on June 14, 2022Comments
-
Mike almost 2 years
I am trying to get the
.describe()
function to output in a reformatted way. Here is the csv data (testProp.csv
)'name','prop' A,1 A,2 B, 4 A, 3 B, 5 B, 2
when I type in the following:
from pandas import * data = read_csv('testProp.csv') temp = data.groupby('name')['prop'].describe() temp.to_csv('out.csv')
the output is:
name A count 3.000000 mean 2.000000 std 1.000000 min 1.000000 25% 1.500000 50% 2.000000 75% 2.500000 max 3.000000 B count 3.000000 mean 3.666667 std 1.527525 min 2.000000 25% 3.000000 50% 4.000000 75% 4.500000 max 5.000000 dtype: float64
However, I want the data in the format below. I have tried
transpose()
and would like to maintain using thedescribe()
and manipulate that instead ofa .agg([np.mean(), np.max(), etc.... )
:count mean std min 25% 50% 75% max A 3 2 1 1 1.5 2 2.5 3 B 3 3.666666667 1.527525232 2 3 4 4.5 5