CentOS 6: strange page allocation failure messages
Solution 1
Workaround for https://bugzilla.redhat.com/show_bug.cgi?id=713546
vm.min_free_kbytes = 512000
vm.zone_reclaim_mode = 1
It was also suggested in this CentOS thread as a potential workaround, http://lists.centos.org/pipermail/centos/2012-October/129844.html.
Solution 2
Please upgrade to kernel-2.6.32-358.el6 equivalent for cenos. The bug has been fixed for this.
Essentially this is about memory allocation in interrupt context. If you want you might check gfp.h in include/linux. The mode 0x20 means that the allocation can't wait, it is in interrupt context, the wait bit for allocation is not set. Therefore, if it isn't allocated, it fails. The fix is quite substantial.
Related videos on Youtube
steveh80
Updated on September 18, 2022Comments
-
steveh80 over 1 year
I set up a new Server with CentOS 6.4 final as successor for an old mysql server and I'm facing some problems with it. From time to time mysql connections are being disconnected. Furthermore the transfer of the large backup tar files to a ftp-storage sometimes fails. Both not reproducible.
While analyzing I found some strange messages that I cannot interpret in /var/log/messages.
Mar 30 13:09:24 s16838172 kernel: swapper: page allocation failure. order:1, mode:0x20 Mar 30 13:09:24 s16838172 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.0.1.el6.x86_64 #1 Mar 30 13:09:24 s16838172 kernel: Call Trace: Mar 30 13:09:24 s16838172 kernel: <IRQ> [<ffffffff8112c207>] ? __alloc_pages_nodemask+0x757/0x8d0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81166ab2>] ? kmem_getpages+0x62/0x170 Mar 30 13:09:24 s16838172 kernel: [<ffffffff811676ca>] ? fallback_alloc+0x1ba/0x270 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8116711f>] ? cache_grow+0x2cf/0x320 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81167449>] ? ____cache_alloc_node+0x99/0x160 Mar 30 13:09:24 s16838172 kernel: [<ffffffff811683cb>] ? kmem_cache_alloc+0x11b/0x190 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81439c18>] ? sk_prot_alloc+0x48/0x1c0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8143acf2>] ? sk_clone+0x22/0x2e0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81489bc6>] ? inet_csk_clone+0x16/0xd0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff814a2ad3>] ? tcp_create_openreq_child+0x23/0x450 Mar 30 13:09:24 s16838172 kernel: [<ffffffff814a02cd>] ? tcp_v4_syn_recv_sock+0x4d/0x310 Mar 30 13:09:24 s16838172 kernel: [<ffffffff814a2876>] ? tcp_check_req+0x226/0x460 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8149fd6b>] ? tcp_v4_do_rcv+0x35b/0x430 Mar 30 13:09:24 s16838172 kernel: [<ffffffffa03b4557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4] Mar 30 13:09:24 s16838172 kernel: [<ffffffff814a157e>] ? tcp_v4_rcv+0x4fe/0x8d0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8147f290>] ? ip_local_deliver_finish+0x0/0x2d0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8147f36d>] ? ip_local_deliver_finish+0xdd/0x2d0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8147f5f8>] ? ip_local_deliver+0x98/0xa0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8147eabd>] ? ip_rcv_finish+0x12d/0x440 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8147f045>] ? ip_rcv+0x275/0x350 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8144827b>] ? __netif_receive_skb+0x4ab/0x750 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8144a658>] ? netif_receive_skb+0x58/0x60 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8144a760>] ? napi_skb_finish+0x50/0x70 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8144cd09>] ? napi_gro_receive+0x39/0x50 Mar 30 13:09:24 s16838172 kernel: [<ffffffffa00f933b>] ? e1000_receive_skb+0x5b/0x90 [e1000e] Mar 30 13:09:24 s16838172 kernel: [<ffffffffa00fc601>] ? e1000_clean_rx_irq+0x241/0x4c0 [e1000e] Mar 30 13:09:24 s16838172 kernel: [<ffffffffa0103bbd>] ? e1000e_poll+0xbd/0x380 [e1000e] Mar 30 13:09:24 s16838172 kernel: [<ffffffffa00f9eca>] ? e1000_put_txbuf+0x6a/0xa0 [e1000e] Mar 30 13:09:24 s16838172 kernel: [<ffffffff8144ce23>] ? net_rx_action+0x103/0x2f0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8109b153>] ? hrtimer_get_next_event+0xc3/0x100 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff810e1720>] ? handle_IRQ_event+0x60/0x170 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81076d95>] ? irq_exit+0x85/0x90 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81516d75>] ? do_IRQ+0x75/0xf0 Mar 30 13:09:24 s16838172 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Mar 30 13:09:24 s16838172 kernel: <EOI> [<ffffffff812d388e>] ? intel_idle+0xde/0x170 Mar 30 13:09:24 s16838172 kernel: [<ffffffff812d3871>] ? intel_idle+0xc1/0x170 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81414fd7>] ? cpuidle_idle_call+0xa7/0x140 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Mar 30 13:09:24 s16838172 kernel: [<ffffffff814f300a>] ? rest_init+0x7a/0x80 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81c27f7b>] ? start_kernel+0x424/0x430 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81c2733a>] ? x86_64_start_reservations+0x125/0x129 Mar 30 13:09:24 s16838172 kernel: [<ffffffff81c27438>] ? x86_64_start_kernel+0xfa/0x109
This kind of message blocks appearing about 2-10 times in 5 minutes, after that they are gone for a few hours.
Can somebody help me with that? I hope its not a hardware problem.
Update: Seems to be reproducible by transferring big files over network (backups to ftp-storage). The ftp upload fails/aborts after a few GB and the stuff above appears in /var/log/messages
Thanks!
-
steveh80 about 11 yearsYou are not authorized to access bug #713546. :-( Can you share more information about what they are talking there? I also read about zone_reclaim_mode=1 brings performance issues to database servers??
-
steveh80 about 11 yearsOk, thanks for this information. Do you know if this kernel upgrade will be available via the standard centos repos? Yum tells me nothing to update...
-
steveh80 about 11 yearsI see, I am already on 2.6.32-358.0.1.el6.x86_64. The bug seems not to be fixed in this version...
-
steveh80 about 11 yearsI applied this settings to /etc/sysctl.conf and reloaded via sysctl -p. Didn't solve that problem.
-
steveh80 about 11 yearsOk: Thats a dedicated server running CentOS 6.4 and everything is updated and at its latest versions (from official centos repos). Intel Xeon E3-1220, 12 GB DDR3 ECC RAM, Software Raid 1TB The only thing I can assume is, that this error comes up on heavy network traffic (transferring big backup files over network via ftp). What further do you need?
-
slm about 11 yearsWhat hardware are we dealing with here? Custom box or a Dell server, or what? You're going to have to go through the box piece by piece and see if there are any open issues with the various components I'm afraid.
-
steveh80 about 11 yearsI don't know. It's a dedicated root server from 1und1.de with pre installed and configured centos min system. That should be pretty standard and nothing special.
-
slm about 11 yearsIt probably wouldn't hurt to enlist 1und1.de's help here. At this point without more info about the make-up of the hardware it's a guessing game for any of us here to try and help. There are a number of patches that have addressed specific issues with Linux kernels and heavy network traffic, but they are dependent on specific hardware like this one or this one.
-
Soham Chakraborty about 11 yearsOh, hold on a day. Let me search a bit more.