Pandas python .describe() formatting/output

18,982

Solution 1

One way to do this would be to first do .reset_index() , to reset the index for your temp DataFrame, and then use DataFrame.pivot as you want . Example -

In [24]: df = pd.read_csv(io.StringIO("""name,prop
   ....: A,1
   ....: A,2
   ....: B,  4
   ....: A,  3
   ....: B,  5
   ....: B,  2"""))

In [25]: temp = df.groupby('name')['prop'].describe().reset_index()

In [26]: newdf = temp.pivot(index='name',columns='level_1',values=0)

In [27]: newdf.columns.name = ''   #This is needed so that the name of the columns is not `'level_1'` .

In [28]: newdf
Out[28]:
      25%  50%  75%  count  max      mean  min       std
name
A     1.5    2  2.5      3    3  2.000000    1  1.000000
B     3.0    4  4.5      3    5  3.666667    2  1.527525

Then you can save this newdf to csv.

Solution 2

You can achieve that by running below code :

from pandas import *
data = read_csv('testProp.csv')
data.describe().T

Solution 3

In pandas v0.22, you can use the unstack feature. Building on from @Kumar answer above, you can use the pandas stack/unstack feature and play around with it's variation.

from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO("""name,prop
   A,1
   A,2
   B,  4
   A,  3
   B,  5
   B,  2"""))

df.shape
df
temp = df.groupby(['name'])['prop'].describe()
temp
temp.stack() #unstack(),unstack(level=-1) level can be -1, 0

Check out the documentation pandas unstack for more details

Share:
18,982
Mike
Author by

Mike

Updated on June 14, 2022

Comments

  • Mike
    Mike almost 2 years

    I am trying to get the .describe() function to output in a reformatted way. Here is the csv data (testProp.csv)

    'name','prop'
    A,1
    A,2
    B,  4
    A,  3
    B,  5
    B,  2
    

    when I type in the following:

    from pandas import *
    
    data = read_csv('testProp.csv')
    
    temp = data.groupby('name')['prop'].describe()
    temp.to_csv('out.csv')
    

    the output is:

    name       
    A     count    3.000000
          mean     2.000000
          std      1.000000
          min      1.000000
          25%      1.500000
          50%      2.000000
          75%      2.500000
          max      3.000000
    B     count    3.000000
          mean     3.666667
          std      1.527525
          min      2.000000
          25%      3.000000
          50%      4.000000
          75%      4.500000
          max      5.000000
    dtype: float64
    

    However, I want the data in the format below. I have tried transpose() and would like to maintain using the describe() and manipulate that instead of a .agg([np.mean(), np.max(), etc.... ):

        count   mean    std min 25% 50% 75% max
    A   3   2   1   1   1.5 2   2.5 3
    B    3  3.666666667 1.527525232 2   3   4   4.5 5