How to monitor and log the memory/cpu usage of processes over time?

91,945

Solution 1

It you want just the top offenders, consider running top with a relatively long interval (60 seconds plus) in batch mode. You may need more than one top running to capture the top offenders on multiple resources. I have configured systems to run top for a few cycles when a resource was being over used.

Consider running sar in batch mode to capture resource utilization. I realize this is server based, but it useful to determine times when problems are occurring.

Run munin and enable notifications. This may give you a chance to get in and watch the server going down. You may be able to correct the problem before it goes down.

For memory leaks, a steady increase in swap usage indicates a problem. I once watched a server slowly die over a period of days. The problem service was a program monitoring other processes for memory leaks. The system admin kept insisting the increasing swap usage was not a problem, right up until the server stopped responding.

You may find that cfengine's anomaly detection can be used to trigger a script to capture the system state when things go wrong. You may want a lot of information besides just the processes using the most resources. For a sudden influx of usage you may want a list of network connections (by address not name). Memory usage is also useful.

Solution 2

sysstat is made pretty much exactly for your kind of purpose.

Solution 3

I've used atop before:

http://freshmeat.net/projects/atop/

"Atop is an ASCII full-screen performance monitor that is capable of reporting the activity of all processes (even if processes have finished during the interval), daily logging of system and process activity for long-term analysis, highlighting overloaded system resources by using colors, etc. At regular intervals, it shows system-level activity related to the CPU, memory, swap, disks, and network layers, and for every active process it shows the CPU utilization, the memory growth, priority, username, state, and exit code."

Solution 4

Have you tried collectd?
It's very powerful and customizable.
Has a lot of plugins and could be integrated with nagios.

http://collectd.org/features.shtml

Solution 5

Server Density does exactly what you describe.

I use it on one of our production servers and am very happy about it. It's top feature is the ability to view charts, click on a peak and see the server CPU/Memory consumption at that current time, including all running processes. They call it snapshots.

It's constantly improving. One of the latest features is anomaly detection, which allows you to easily detect anomalies. You can also setup various tresholds

Share:
91,945

Related videos on Youtube

Artem Russakovskii
Author by

Artem Russakovskii

http://twitter.com/ArtemR

Updated on September 17, 2022

Comments

  • Artem Russakovskii
    Artem Russakovskii over 1 year

    I am looking for a way to diagnose issues, such as swap death, where a balooning memory process fills up swap and kills the whole machine (such as apache).

    I'm already using cacti and I can set up nagios (though would rather not) or munin but as far as I can tell they can't record individual program usage - just overall status.

    I know I can roll a script that >> to some file every 30s but I'd like to see if an existing mature solution already exists.

    Again, ideally it would:

    • record processes' memory usage every N seconds
    • record processes' CPU usage every N seconds
    • support charts and history
    • support averages - like mysqld has used 43% CPU in the last day and averaged 400MB memory
    • be free and open source

    Process names are not and should not be known in advance - the idea is to just let it monitor and then have a look at the top offenders.

    My system is Linux (OpenSUSE).

    • Stefan Lasiewski
      Stefan Lasiewski almost 14 years
      Do you want to monitor any process which may have a memory leak (The top N memory hogs) or are you looking to monitor a defined set of processes (e.g. Apache webserver and a Tomcat process)? The latter is doable with some simple Nagios or Cacti plugins. The former is more difficult. You should clarify this.
    • Artem Russakovskii
      Artem Russakovskii almost 14 years
      I already clarified it in the post but to clarify again: I want to know the state of the system when it goes down due to swap death. I want to know who the worst offenders are. And btw, it doesn't have to be a memory leak - just an influx of traffic, or whatever causes high memory usage. So, again, no advance knowledge of binary names should be configured.
    • warren
      warren almost 14 years
    • Artem Russakovskii
      Artem Russakovskii almost 14 years
      Warren, that's an entirely different question.
    • peterh
      peterh about 9 years
      Closing a such good quality post was a bad thing, especially after 4 years retroactively.
  • Artem Russakovskii
    Artem Russakovskii almost 14 years
    atop doesn't seem to have a report that would provide me with what I wanted. Please correct me if I'm wrong.
  • Artem Russakovskii
    Artem Russakovskii almost 14 years
    I'd like a ready solution for reporting the things I mentioned, most importantly processes consuming the most memory. I'm also not sure what VZ is.
  • Artem Russakovskii
    Artem Russakovskii almost 14 years
    Clarification: Process names are not and should not be known in advance - the idea is to just let it monitor and then have a look at the top offenders.
  • NinjaCat
    NinjaCat almost 14 years
    It takes care of your first two bullet points (memory/cpu by process). You can use the library to gather these stats and then do your history / graphing based on the data.
  • Artem Russakovskii
    Artem Russakovskii almost 14 years
    Ah, I forgot to mention the little part where I'd prefer it to be free, and open source, if possible. Over $100 per server is not really what I'm looking to spend (and I only have 1 server, not 5). serverdensity.com/pricing
  • Marius Gedminas
    Marius Gedminas almost 14 years
    Collectd is very lightweight, not too difficult to set up, and will let you see memory/swap growth over time. It will not pinpoint the offending processes, though -- but maybe you'll be able to notice and catch the memory growth in time and inspect the situation manually with top.
  • PiL
    PiL almost 14 years
    I have to say that i didn't try that plugin, but reading from the manual of process plugin of collectd: "If processes are selected the following information is gathered. All this information is aggregated by the process name. Its Resident Segment Size, Used user- and system-time, The number of processes by that name, The number of threads (summed up over all the processes), The number of major and minor page faults. Rough I/O-numbers (bytes written and read due to syscalls by the process).
  • PiL
    PiL almost 14 years
    You can select the processes or by name or by regex.
  • Allen
    Allen almost 14 years
    This is where you should start. You can't know where to start an examination until you know where you might have the best chances. Sysstat is what you are looking for (also has pretty graphs). Once you know more use systemtap.
  • sciurus
    sciurus about 13 years
    @artem-russakovskii - By default atop logs data to a file every ten minutes. If your server crashed at 3:45 you could start atop with atop -r log_filename, press m to switch to the per-process memory usage view, and then press t to move forward in 10 minute increments until 3:40. You can read more about the basics of using atop at lwn.net/Articles/387202 and see an example of identifying a memory leak at atoptool.nl/download/case_leakage.pdf