What's the fairest way to monitor total CPU time - per user?

kernel cpu-load proc uptime

7,629

Solution 1

Sounds like you need process accounting.

http://www.faqs.org/docs/Linux-mini/Process-Accounting.html

On Ubuntu, the process accounting tools are in the acct package

To get a per-user report, run

sa -m

Solution 2

This will give a line for each user showing the username and their total cpu time:

ps -w -e --no-header -o uid,user \
        | sort -u \
        | while read uid user; do
                echo -e "$user\t"$(
                        ps --no-headers -u $uid --cumulative -o time \
                                | sed -e s/:/*3600+/ -e s/:/*60+/ \
                                | paste -sd+ \
                                | bc
                );
        done

Solution 3

One of the more obvious answers is to just extend what you're currently doing now.

I came across this monitor process for using bash scripting and mysql to track the cpu time of users but it was spanned across a much larger time frame than you were talking about.

Hopefully this can give you some more ideas about the direction you're looking to head in.

http://www.dba-oracle.com/t_oracle_unix_linux_vmstat_capture.htm

7,629

flo

I'm a Software Developer in Berlin, Germany.

Updated on September 18, 2022

Comments

flo almost 2 years
On a multi user system, I want to measure each user's CPU usage in seconds of cpu time. For the purpose of this measurement, I assume that if a PID belongs to a user, this user is causing the CPU time - that is I'm ignoring daemons and the kernel.

Currently I'm doing this, every five seconds:
1. Get each user and the PIDs they are running via ps aux
2. For each PID, get x, the sum of utime, cutime, stime and cstime from /proc/[pid]/stat
3. calculate t = x / interval (interval isn't always exactly 5 seconds when there's high load)
If I run this, I get sensible looking values. For instance: A user on this system was spinning in python (while True: pass), and the system was showing round about 750 milliseconds of CPU time per second. When the system hung for a bit, it reported 1600ms for one 1-second inverval. Which seems about right, but I undestand that these values can be deceiptful, especially given I don't really understand them.

So my question is this:

What is a fair and correct way to measure CPU load on a per-user basis?

The method has to be rather accurate. There might be many hundreds of users on this system, so extracting percentages from ps aux will not be accurate enough, especially for short-lived threads which many pieces of software like to spawn.

While this might be complicated, I absolutely know it's possible. This was my starting point:

The kernel keeps track of a processes creation time as well as the CPU time that it consumes during its lifetime. Each clock tick, the kernel updates the amount of time in jiffies that the current process has spent in system and in user mode. — (from the Linux Documentation Project)

The value I'm after is the amount of seconds (or jiffies) that a user has spend on the CPU, not a percentage of system load or cpu usage.

It's important that we measure CPU time while the processes are still running. Some processes will only last for half a second, some will last for many months - and we need to catch both sorts, so that we can account for users' CPU time with fine granularity.
- Tachyons over 12 years
  
  500 reputation :o good chance for beginers
- kingmilo over 12 years
  
  A bit out of my league, but a very interesting question so I dug a bit and found something that I hope is at least useful to help you solve this: stackoverflow.com/a/1424556/905573
- Rinzwind over 12 years
  
  you do know top can do batch mode? top -b -n 1 -u {user} | awk 'NR>7 { sum += $9; } END { print sum; }' should show the load for {user} at that moment.
flo over 12 years

Unfortunately, this won't work for me as "sa" will not count long-running processes. What I need (I think) is a way to detect processes being started and terminated, and to record their cpu time when they quit, as well as while they are running.
RusGraf over 12 years

@StefanoPalazzo I believe this is the best you'll get. Augment it with times for running processes from /proc/[pid]/stat.
flo over 12 years

As it turns out, it seems that most all processes will be accounted for properly by sa (.ps.gz). And I also have a good way to "estimate" those long running processes, before eventually getting an accurate value for those as well. So we'll use it after all, and I'm more than happy to award the bounty to your answer. Thanks a bunch!