Pandas standard deviation on one column for subset of rows

python pandas statistics standard-deviation

13,854

IIUC, you'll want to first do df.groupby on Hostname and then find the standard deviation. Something like this:

In [118]: df.groupby('Hostname')[['CPU Peak', 'Memory Peak']].std()
Out[118]: 
           CPU Peak  Memory Peak
Hostname                        
server1   23.560798    19.212091

13,854

Thomas

Updated on July 09, 2022

Comments

Thomas almost 2 years
I'm new to working with Python and Pandas. Currently I'm attempting to create a report that extracts data from an SQL database and using that data in a pandas dataframe. In each row is a server name and date of sample and then sample data per column following that.

I have been able to filter by the hostname using df[df['hostname'] == uniquehost] df being a variable for the dataframe and uniquehost being a variable for each unique host name.

What I am trying to do next is to obtain the stdev of the other columns although I haven't been capable of figuring this part out. I attempted to use df[df['hostname'] == uniquehost].std()

However, this wasn't correct.

Can anyone point me in the appropriate direction to get this figure out? I suspect I'm barking up the wrong tree and there's likely a very easy way to handle this that I haven't encountered yet.
```
Hostname | Sample Date | CPU Peak | Memory Peak 
server1 | 08/08/17 | 67.32 | 34.83 
server1 | 08/09/17 | 34 | 62
```
Thomas almost 7 years

awesome thanks. I'll give this a try and get back to you.