Libvirt / QEmu Machine Fails and Refuses Restart Because of Memory Allocation Errors

5,151

Based on the error message you pasted, you don't have enough free memory, though -- something's trying to allocate 3221225472 bytes (3.2GB or so), and free(1) is saying you've got at MOST (if you use no buffers or cache) about 1.3GB of available memory.

It looks like you're running without swap. That certainly won't be helping your memory allocation problems. If you normally run with swap (which I'd consider a very good idea on a virtualisation server), then something's gone a bit wrong, and you'll want to fix that.

The other possibility is that you've configured things to run with hugepages, and now that the system's been running for a while you don't have enough unfragmented memory to allocate all the hugepages your VM wants.

Share:
5,151

Related videos on Youtube

Elmar Weber
Author by

Elmar Weber

Updated on September 18, 2022

Comments

  • Elmar Weber
    Elmar Weber almost 2 years

    I'm having a problem with libvirt. On a system restart all virtual machines (VMs) are started without a problem and keep running. Then at some point in time a set of machines shuts down according to their log. When I try to restart the machine, I'm getting an error that the memory allocation failed, although more than enough memory is free.

    server ~ # free
                 total       used       free     shared    buffers     cached
    Mem:      16176648   16025476     151172          0     285432     950300
    -/+ buffers/cache:   14789744    1386904
    Swap:            0          0          0
    server ~ # virsh start zimbra
    error: Failed to start domain zimbra
    error: Unable to read from monitor: Connection reset by peer
    
    server ~ # tail -n 4 /var/log/libvirt/qemu/zimbra.log
    LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 3072 -smp 2,sockets=2,cores=1,threads=1 -name zimbra -uuid d05ddb7a-83c4-a77b-d8bc-a322648520cf -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/zimbra.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/var/lib/libvirt/images/zimbra.img,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=19,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:21:a9:ad,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 192.168.1.2:25 -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
    char device redirected to /dev/pts/2
    Failed to allocate 3221225472 B: Cannot allocate memory
    2012-07-06 08:42:56.076+0000: shutting down
    server ~ # uname -a
    Linux server 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
    

    The system is a Ubuntu 12.04 server. The problem seems to occurs since the last restart, which was due to a number of package upgrades and a kernel upgrade. I tried booting with the previous kernel, the problem persists. The machines fail one after another. The buffers used by the kernel is always increasing. What I'm not sure of is whether this is the cause of the crash or is just a reaction to the available free space.

    Munin Memory by Day Munin Memory by Week

    Any suggestions on how to debug this?

    Addendum:

    Best regards, elm

  • Elmar Weber
    Elmar Weber almost 12 years
    I thought the buffers part of 'free' is only cached data, i.e. available when needed. Because when I take a look at 'top' and the memory usage of the top processes, at most 2 GB are used.
  • womble
    womble almost 12 years
    Yes, but you've only got about 280MB of buffers and 950MB of cache, which isn't anywhere near enough to be able to accomodate an extra 3.2GB of allocation.
  • Elmar Weber
    Elmar Weber almost 12 years
    I added the munin graphs of the memory around the time of the crashes, maybe this helps. I'm unsure whether the space used for buffers is the cause of the crash or just happens because more memory is free.
  • womble
    womble almost 12 years
    Oooooh... I see what you're misunderstanding. That second line in the output of free isn't "buffers/cache used" and "buffers/cache free", it's "the amount of memory used/free once buffers and cache are considered free memory". The -/+ at the beginning of the line is a (subtle) hint in that direction. You can learn more at linuxatemyram.com
  • womble
    womble almost 12 years
    Based on those graphs, I'm going to re-emphasise your lack of swap space, and highly recommend you fix that.
  • Elmar Weber
    Elmar Weber almost 12 years
    Thanks, I'll try it with swap space. What I still do not understand is where the 13G of used buffers are, i.e. who is using them for what. There is no application that uses the memory and kernel disk cache should be freed on demand.
  • womble
    womble almost 12 years
    YOU DON'T HAVE 13GB OF USED BUFFERS. You have 285MB of used buffers.
  • Elmar Weber
    Elmar Weber almost 12 years
    Yes, I got that, I misformulated the question. I did not mean buffers, I meant the space used by the kernel, showing in the graphs as slab_cache. That space does not show up in top or ps as memory used by a process, so I assume it is used by the kernel as a cache / buffer for some kind of thing? I'm currently trying to get behind the question why it is not freed.
  • Elmar Weber
    Elmar Weber almost 12 years
    I tried it with swap space, the problem is then just prolonged. Today it happened again.
  • womble
    womble almost 12 years
    Oh well, time to buy more memory or run less VMs.