Pandas python .describe() formatting/output

python pandas formatting output describe

18,982

Solution 1

One way to do this would be to first do .reset_index() , to reset the index for your temp DataFrame, and then use DataFrame.pivot as you want . Example -

In [24]: df = pd.read_csv(io.StringIO("""name,prop
   ....: A,1
   ....: A,2
   ....: B,  4
   ....: A,  3
   ....: B,  5
   ....: B,  2"""))

In [25]: temp = df.groupby('name')['prop'].describe().reset_index()

In [26]: newdf = temp.pivot(index='name',columns='level_1',values=0)

In [27]: newdf.columns.name = ''   #This is needed so that the name of the columns is not `'level_1'` .

In [28]: newdf
Out[28]:
      25%  50%  75%  count  max      mean  min       std
name
A     1.5    2  2.5      3    3  2.000000    1  1.000000
B     3.0    4  4.5      3    5  3.666667    2  1.527525

Then you can save this newdf to csv.

Solution 2

You can achieve that by running below code :

from pandas import *
data = read_csv('testProp.csv')
data.describe().T

Solution 3

In pandas v0.22, you can use the unstack feature. Building on from @Kumar answer above, you can use the pandas stack/unstack feature and play around with it's variation.

from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO("""name,prop
   A,1
   A,2
   B,  4
   A,  3
   B,  5
   B,  2"""))

df.shape
df
temp = df.groupby(['name'])['prop'].describe()
temp
temp.stack() #unstack(),unstack(level=-1) level can be -1, 0

Check out the documentation pandas unstack for more details

18,982

Author by

Mike

Updated on June 14, 2022

Comments

Mike almost 2 years

I am trying to get the .describe() function to output in a reformatted way. Here is the csv data (testProp.csv)

'name','prop'
A,1
A,2
B,  4
A,  3
B,  5
B,  2

when I type in the following:

from pandas import *

data = read_csv('testProp.csv')

temp = data.groupby('name')['prop'].describe()
temp.to_csv('out.csv')

the output is:

name       
A     count    3.000000
      mean     2.000000
      std      1.000000
      min      1.000000
      25%      1.500000
      50%      2.000000
      75%      2.500000
      max      3.000000
B     count    3.000000
      mean     3.666667
      std      1.527525
      min      2.000000
      25%      3.000000
      50%      4.000000
      75%      4.500000
      max      5.000000
dtype: float64

However, I want the data in the format below. I have tried transpose() and would like to maintain using the describe() and manipulate that instead of a .agg([np.mean(), np.max(), etc.... ):

    count   mean    std min 25% 50% 75% max
A   3   2   1   1   1.5 2   2.5 3
B    3  3.666666667 1.527525232 2   3   4   4.5 5

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

How to save a whole pandas dataframe in text file

Pandas DataFrame.to_sql() error - not all arguments converted during string formatting

How to describe columns as categorical values?

Format the color of a cell in a pandas dataframe according to multiple conditions

Apply Formatting to Each Column in Dataframe Using a Dict Mapping

Remove Header and Footer from Pandas Dataframe print

What is the fastest way to output large DataFrame into a CSV file?

Can you format pandas integers for display, like `pd.options.display.float_format` for floats?

Format certain floating dataframe columns into percentage in pandas

Pandas: Setting no. of max rows