XFS: possible memory allocation deadlock in kmem_alloc

5,628

Solution 1

It's related to memory fragmentation and filesystem fragmentation, see https://bugzilla.kernel.org/show_bug.cgi?id=73831

You should check your filesystem fragmentation with xfs_db -r -c 'frag' <filesystem>' . Keeping it not too full (80% or less) and running xfs_fsrfor a while should help, too.

Solution 2

I believe the current revision of CentOS 7 has kernel 3.10.0-693.2.2.el7 and newer XFS user space tools. Is there any reason you're not on a more current OS? The versions you specified date back to 2015.

Share:
5,628

Related videos on Youtube

Vince
Author by

Vince

Research Associate at the Lady Davis Institute.

Updated on September 18, 2022

Comments

  • Vince
    Vince almost 2 years

    I am performing a data analysis that entails loading a large data matrix of ~112GB into a memory-mapped file using R programming language, specifically the bigmemory package (see https://cran.r-project.org/web/packages/bigmemory/index.html). The matrix has 80664 columns and 356751 rows.

    Data storage consists of NFS-mounted XFS filesystem.

    XFS mount options are:

    xfs noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=131072k
    

    NFS is exporting the FS using the following options:

    rw,async,no_subtree_check,no_root_squash
    

    NFS client is mounting the FS using these options:

    defaults,async,_netdev
    

    After sometime in loading the file, the compute node becomes unresponsive (including other nodes on the cluster) and the file server logs report the following errors:

    XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
    

    Which I can resolve by dropping cache like so:

    echo 3 > /proc/sys/vm/drop_caches
    

    The file server has 16 GB of memory.

    I have already read though the following blog:

    https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/

    However, the issue is not due to fragmentation, as the fragmentation reported is below 2% for the filesystem I am writing to.

    So, due to the XFS error above, I assume that the file server is running out of memory as it cannot handle the number of IO requests issued by the task at hand.

    Apart from dropping cache periodically (eg. via cron), is there a more permanent solution to this?

    Thanks in advance for the help.

    Edit: CentOS 7.2 on client and server.

    Edit #2: Kernel 3.10.0-229.14.1.el7.x86_64 on client and server.

    • ewwhite
      ewwhite over 6 years
      The most important detail you can provide is your operating system, distribution and version. Otherwise...
    • ewwhite
      ewwhite over 6 years
      Please include the kernel version.