High system interrupt rate

8,343

I assume you don't have single socket system with 24C CPU. So it's probably NUMA system with 2x12C. In that case I'd suggest to make sure the program uses only one numa node (usually socket) and it's local half of RAM.

When you have 50G used, that means, numa locality can't be assured as it's more than half of the memory.

For checking of actual state, use numastat. If you're on the RHEL, you may use numad to handle memory locality automatically. Or you may use numactl --hardware will give you overview about your HW NUMA nodes. There is quite nice howto with examples:

http://fibrevillage.com/sysadmin/534-numactl-installation-and-examples

That way you may lock your program on desired CPUs.

And I'd suggest to check if you have irqbalance daemon running, otherwise you may have one core overloaded with interrupts.

Share:
8,343

Related videos on Youtube

user416983
Author by

user416983

Updated on September 18, 2022

Comments

  • user416983
    user416983 almost 2 years

    My server has 24 CPU cores, 96G memory, installed CentOS 7.2 x86_64.

    After starting my program with a large data set, my program will use about 50G memory, and the Linux system will show a high rate of system interrupts, but context switching rate will be low. dstat will show somewhere between 500k int/s and 1000k int/s. CPU usage will be close to 100%, about 40% us, 60% sy.

    If the data set is small, the program will use about 5G memory, and everything will be fine, CPU usage 100%, about 99%us, 1% sy. It's expected.

    The program is written by myself, it's a multi-thread program. It doesn't do any network IO, very little disk IO, mostly memory operations and arithmetic. The thread model and the algorithm are the same regardless of the data set size.

    My question is, how can I find out exactly which interrupts are used the most by my program (and get rid of them to improve performance if possible) ?

    • Tim Lamballais
      Tim Lamballais over 7 years
      It would be good to have a little more context (like what language/interpreter/runtime your are using), but it sounds like you may have a lot of page faults. You can inspect this by running sudo pidstat -r -p $PID 1, the last parameter is an interval in seconds.
    • Nils
      Nils about 7 years
      Arithmetic: What are your typical operations? On an old i486 I would have guessed that the coprocessor is causing those interrupts.
    • c4f4t0r
      c4f4t0r almost 7 years
      cat /proc/meminfo and look for the pagetable size, if your programm use a lot of memory, you can try to use large page to reduce the number of page fault, sar -B 1 100 can help you to see you have many page faults, look for the colunm pgscank/s man sar