Pandas standard deviation on one column for subset of rows

13,854

IIUC, you'll want to first do df.groupby on Hostname and then find the standard deviation. Something like this:

In [118]: df.groupby('Hostname')[['CPU Peak', 'Memory Peak']].std()
Out[118]: 
           CPU Peak  Memory Peak
Hostname                        
server1   23.560798    19.212091
Share:
13,854

Related videos on Youtube

Thomas
Author by

Thomas

Updated on July 09, 2022

Comments

  • Thomas
    Thomas almost 2 years

    I'm new to working with Python and Pandas. Currently I'm attempting to create a report that extracts data from an SQL database and using that data in a pandas dataframe. In each row is a server name and date of sample and then sample data per column following that.

    I have been able to filter by the hostname using df[df['hostname'] == uniquehost] df being a variable for the dataframe and uniquehost being a variable for each unique host name.

    What I am trying to do next is to obtain the stdev of the other columns although I haven't been capable of figuring this part out. I attempted to use df[df['hostname'] == uniquehost].std()

    However, this wasn't correct.

    Can anyone point me in the appropriate direction to get this figure out? I suspect I'm barking up the wrong tree and there's likely a very easy way to handle this that I haven't encountered yet.

    Hostname | Sample Date | CPU Peak | Memory Peak 
    server1 | 08/08/17 | 67.32 | 34.83 
    server1 | 08/09/17 | 34 | 62
    
  • Thomas
    Thomas almost 7 years
    awesome thanks. I'll give this a try and get back to you.