XFS: possible memory allocation deadlock in kmem_alloc

nfs xfs nfs4

5,628

Solution 1

It's related to memory fragmentation and filesystem fragmentation, see https://bugzilla.kernel.org/show_bug.cgi?id=73831

You should check your filesystem fragmentation with xfs_db -r -c 'frag' <filesystem>' . Keeping it not too full (80% or less) and running xfs_fsrfor a while should help, too.

Solution 2

I believe the current revision of CentOS 7 has kernel 3.10.0-693.2.2.el7 and newer XFS user space tools. Is there any reason you're not on a more current OS? The versions you specified date back to 2015.

5,628

Vince

Research Associate at the Lady Davis Institute.

Updated on September 18, 2022

Comments

Vince almost 2 years
I am performing a data analysis that entails loading a large data matrix of ~112GB into a memory-mapped file using R programming language, specifically the bigmemory package (see https://cran.r-project.org/web/packages/bigmemory/index.html). The matrix has 80664 columns and 356751 rows.

Data storage consists of NFS-mounted XFS filesystem.

XFS mount options are:
```
xfs noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=131072k
```
NFS is exporting the FS using the following options:
```
rw,async,no_subtree_check,no_root_squash
```
NFS client is mounting the FS using these options:
```
defaults,async,_netdev
```
After sometime in loading the file, the compute node becomes unresponsive (including other nodes on the cluster) and the file server logs report the following errors:
```
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
```
Which I can resolve by dropping cache like so:
```
echo 3 > /proc/sys/vm/drop_caches
```
The file server has 16 GB of memory.

I have already read though the following blog:

https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/

However, the issue is not due to fragmentation, as the fragmentation reported is below 2% for the filesystem I am writing to.

So, due to the XFS error above, I assume that the file server is running out of memory as it cannot handle the number of IO requests issued by the task at hand.

Apart from dropping cache periodically (eg. via cron), is there a more permanent solution to this?

Thanks in advance for the help.

Edit: CentOS 7.2 on client and server.

Edit #2: Kernel 3.10.0-229.14.1.el7.x86_64 on client and server.
- ewwhite over 6 years
  
  The most important detail you can provide is your operating system, distribution and version. Otherwise...
- ewwhite over 6 years
  
  Please include the kernel version.