Get per-core CPU load in shell script

10,324

Solution 1

It turns out that some of the MIBs installed on RedHat provide all the information needed here. Since my goal is to provide these values under an OID via SNMP, I can make use of SNMP and process the information.

The all-cpu average is computed as 100-idle:

function allCpuLoad {
    # get system idle value from
    # snmpget -v2c -cmdaf localhost UCD-SNMP-MIB::ssCpuIdle.0
    # UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 93
    # and compute load by substracting it from 100.0 
    snmpget -v2c -cmdaf localhost UCD-SNMP-MIB::ssCpuIdle.0|cut -f4 -d' '| awk '{printf "%d", 100 - $1}'
}

We can use snmpwalk to get the load of all individual cpus and then extract the maximum value:

function maxCpuLoad {
    # get load of all cpus
    # snmpwalk -v2c -cmdaf localhost HOST-RESOURCES-MIB::hrProcessorLoad
    # HOST-RESOURCES-MIB::hrProcessorLoad.196608 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196609 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196610 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196611 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196612 = INTEGER: 6
    # HOST-RESOURCES-MIB::hrProcessorLoad.196613 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196614 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196615 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196616 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196617 = INTEGER: 27
    # HOST-RESOURCES-MIB::hrProcessorLoad.196618 = INTEGER: 4
    # HOST-RESOURCES-MIB::hrProcessorLoad.196619 = INTEGER: 0
    # HOST-RESOURCES-MIB::hrProcessorLoad.196620 = INTEGER: 1
    # HOST-RESOURCES-MIB::hrProcessorLoad.196621 = INTEGER: 0
    # HOST-RESOURCES-MIB::hrProcessorLoad.196622 = INTEGER: 0
    # HOST-RESOURCES-MIB::hrProcessorLoad.196623 = INTEGER: 1
    # and get maximum value only
    snmpwalk -v2c -cmdaf localhost HOST-RESOURCES-MIB::hrProcessorLoad|cut -f 4 -d' '|sort -n -r|head -n1
}

Solution 2

There are several ways to accomplish sub-sccond polling of the cpu load, either by using a utility, like dstat, (example below) or by directly polling /proc/stat, (example also below).

Let’s go over the pros and cons for both, before going on to the technical examples.

To use dstat, you will need to run a fast crontab( */1 *  *   *   * ) and pipe the result to a statistics file that you can check. The upside is that your SNMP timeouts aren’t going to be an issue, the downside, it’s not truly instantaneous, and running the crontab when you’re not actually looking for this data has an impact. The impact may be negligible, but still, it’s there.

To use /proc/stat, you must poll the content of /proc/stat twice. The contents of /proc/stat are cumulative from boot. So the first poll and second poll results need to be subtracted from each other and then the calculation for the current load can be made. The downside is that there must be some form of delay to make this calculation. In the example below, I’ve gotten the delay down to sub-second. This would meet your needs, however the data samples are so close together I’m not sure how absolute the accuracy is.

Using dstat; Add this line to /etc/crontab:

*/1  *  *  *  *  root    echo $((100-`dstat -c -C0 --noheaders --nocolor 1 1 | grep -v "\-\|u" | awk 'NR == 2' | tr -s " " | cut -d \  -f 4`)) > /tmp/cpuload

This only updates once every minute. If you want more frequent updates, add a second line and preface the command with sleep 30, like

*/1  *  *  *  *   root    sleep 30; echo $((100-`dstat -c -C0 --noheaders --nocolor 1 1 | grep -v "\-\|u" | awk 'NR == 2' | tr -s " " | cut -d \  -f 4`)) > /tmp/cpuload

It is possible to use(abuse) cron even farther and get to sub second results, but that's another topic entirely.

explanation:

dstat -c -C 0 --noheaders --nocolor 1 0

-c only show cpu data

-C select cpu0. change numbering to select other cpu

--noheaders --nocolor (implied --noupdate) simplify what we see

1 one second delay in reading of stats

1 exit after second read of stats. Giving it time to settle down after invocation.

grep -v "-\|u"

remove non data lines

awk 'NR == 2'

select the second line.

tr -s " "

trim out the extra spaces that look nice on screen but not for system use

cut -d \ -f 4

-d \ (there is a space after the \ (escape) space delineated line -f 4 select the idle. yes it's visually 3, but the space at the start of the line counts as a field, throwing off the field count.

$ (( ))

bash arithmetic operations, subtracting the system idle from 100.

Using /proc/stat;

Save as cpuload.sh;

#!/bin/bash

#Calculation delay. Without a delay, there is no way to determine current 
#values. The content or /proc/stat is cumulitative from last boot.  
# in seconds; sleep must be able to support float values
dly=3

function calculate {

#load arrays
IFS=' ' read -r -a firstarr <<< "$1"
IFS=' ' read -r -a secondarr <<< "$2"

#clear name fields in array so that calculations don't get messy
firstarr[0]=0 ;
secondarr[0]=0 ;

#clear values 
firsttotcpu=0
secondtotcpu=0

#calculate the begining interrupt counts
for f in ${firstarr[@]}; 
    do 
        let firsttotcpu+=$f; 
done
firstidle=$((${firstarr[4]}+${firstarr[5]})); 

#calculate the ending interrupt counts
for l in ${secondarr[@]}; 
    do
        let secondtotcpu+=$l; 
    done; 
secondidle=$((${secondarr[4]}+${secondarr[5]})); 

#calculate the relative change counts
insttotcpu=$(( secondtotcpu - firsttotcpu ))
instidle=$(( secondidle - firstidle ))

#calculate the utilization percentage. must be done external to bash as it's a
#floating calculation
cpu_load=$( echo | awk -v tot=$insttotcpu -v idl=$instidle ' { print ( ( ( tot - idl ) / tot ) * 100 ) } ' )

echo -n $cpu_load " " 


} 
export -f calculate

#main execution

oldIFS=$IFS

IFS=$'\n' cpu_start=( $( grep cpu /proc/stat ) );

#must delay to get difference
sleep $dly

IFS=$'\n' cpu_end=( $( grep cpu /proc/stat ) );

cpucount=${#cpu_start[@]}

#uncomment this for loop to enable printing the cpu name above the percentages
#for i in ${cpu_start[@]};
#    do
#        IFS=' ' read -r -a name <<< "$i"
#        echo -n ${name[0]} " "
#done
#echo ""

for (( i=0; i<$cpucount; i++ ))
    do
        calculate "${cpu_start[$i]}" "${cpu_end[$i]}"

done

echo ""

IFS=$oldIFS

Solution 3

Another way to get the raw values would be to grep cpu0 /proc/stat. There you see the number of ticks in each state. Do man proc for the details on the interpretation. If you want it as a percentage you have to sum them together and divide, for example along the lines of what John W. Gill suggests.

Solution 4

mpstat -P ALL -u | tail -n +5 | awk '{print (100-$12)}'

explanation:

mpstat

reports processor statistics

-P ALL

for all cores

-u

show the utilization (stats from /proc instead of live measurements)

tail -n +5 

start with line 5

awk '{print (100-$12)}'

print the 12th value of every line subtracted from 100

Solution 5

Here is a bash based example script (using /proc/stat) with explanations. it can run as fast as you need it to. Save as /tmp/cpuLoad.sh, then "chmod +x /tmp/cpuLoad.sh" and last run: /tmp/cpuLoad.sh

#!/bin/bash

interval=0.25; ##loop interval in seconds

##so settings below
lCpus=(); ##store last readings
lCount=0; ## loop counter

while :; do {

    cCpu=(); ##current cpu
    cCpus=(); ##all cpus
    values=$(grep -E "cpu[0-9]+\s" /proc/stat);
    for value in $values; do {
        if [[ $value =~ ^cpu[0-9]+ ]]; then
            if [[ ${#cCpu[@]} > 0 ]]; then
                cCpus[${cCpu[1]}]="${cCpu[@]}"
            fi

            cCpu[0]=$value; ##name
            cCpu[1]=${#cCpus[@]}; ##cpu index
            cCpu[2]=0; ##cpu idle ticks
            cCpu[3]=0; ##cpu busy ticks
            i=0; ## column index

        else
            ((i=i+1));
            if ([ $i == 4 ] || [ $i == 5 ]); then
                # position 4 is the idle, position 5 is the i/o wait (also idle introduced 2.5.41) src https://www.idnt.net/en-US/kb/941772
                ((cCpu[2]=cCpu[2] + value));
            else
                ((cCpu[3]=cCpu[3] + value));
            fi
        fi
    } done

    ##include the last cpu
    cCpus[${cCpu[1]}]="${cCpu[@]}"

    output="Loop $lCount";
    x=0;
    for cpu in "${cCpus[@]}"; do {
        if [[ $lCount > 0 ]]; then
        
            cCpu=($cpu);
            lCpu=(${lCpus[$x]});
            dTotal=$(((${cCpu[2]} + ${cCpu[3]}) - (${lCpu[2]} + ${lCpu[3]})));
            dUsed=$((dTotal - (${cCpu[2]} - ${lCpu[2]})));
            if [[ $dTotal == 0 ]]; then
                dTotal=1; ##dividing by 0 is never a good idea
            fi
            output="$output, ${cCpu[0]}: $((100 * dUsed / dTotal))%";
        fi
        ##store the reading so we can do a delta next round
        lCpus[$x]=$cpu;
        ((x=x+1));
        
    } done
    
    if [[ $lCount > 0 ]]; then
        echo $output;
    fi
    
    sleep $interval;
    ((lCount=lCount+1));
    
} done
Share:
10,324

Related videos on Youtube

Jens
Author by

Jens

Updated on September 18, 2022

Comments

  • Jens
    Jens almost 2 years

    I need to report the CPU load per core as a percentage from a shell script, but I cannot run e.g. mpstat for one second. Basically I think that the infos top is showing after pressing 1 is what I want, but I cannot configure top to show this in batch mode (at least I don't know how). I could create a ~/.toprc file with the configuration, but then I have to hope that the users do not mess with it.

    I looked at mpstat and parse the output, but this supports only seconds as interval time. My script gets called via SNMP and waiting 1s for the response will generate a timeout, so this is not an option.

    Are there other ways to get the per-core cpu load? I read about parsing /proc/stat, but I think this is more a last resort.

  • Jens
    Jens over 7 years
    Does dstat support a delay less than 1s? I cannot wait 1s because then SNMP generates a timeout.
  • Jens
    Jens over 7 years
    The problem is that mpstat returns the average over the full uptime if you do not pass an interval. The minimum interval is 1s which is too long for us.
  • John
    John over 7 years
    dstat doesn't go lower than 1 second. There are ways to make it work. I extended my answer in several ways, maybe one will help you get what you want.