Making Linux read swap back into memory

5,756

Solution 1

Based on memdump program originally found here I've created a script to selectively read specified applications back into memory. remember:

#!/bin/bash
declare -A Q
for i in "$@"; do
    E=$(readlink /proc/$i/exe);
    if [ -z "$E" ]; then
        #echo skipped $i;
        continue;
    fi
    if echo $E | grep -qF memdump; then
        #echo skipped $i >&2;
        continue;
    fi
    if [ -n "${Q[${E}]}" ]; then
        #echo already $i >&2;
        continue;
    fi
    echo "$i $E" >&2
    memdump $i 2> /dev/null
    Q[$E]=$i
done | pv -c -i 2 > /dev/null

Usage: something like

# ./remember $(< /mnt/cgroup/tasks )
1 /sbin/init
882 /bin/bash
1301 /usr/bin/hexchat
...
2.21GiB 0:00:02 [ 1.1GiB/s] [  <=>     ]
...
6838 /sbin/agetty
11.6GiB 0:00:10 [1.16GiB/s] [      <=> ]
...
23.7GiB 0:00:38 [ 637MiB/s] [   <=>    ]
# 

It quickly skips over non-swapped memory (gigabytes per second) and slows down when swap is needed.

Solution 2

It might help to up /proc/sys/vm/page-cluster (default: 3).

From the kernel documentation (sysctl/vm.txt):

page-cluster

page-cluster controls the number of pages up to which consecutive pages are read in from swap in a single attempt. This is the swap counterpart to page cache readahead. The mentioned consecutivity is not in terms of virtual/physical addresses, but consecutive on swap space - that means they were swapped out together.

It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. Zero disables swap readahead completely.

The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive.

Lower values mean lower latencies for initial faults, but at the same time extra faults and I/O delays for following faults if they would have been part of that consecutive pages readahead would have brought in.

The documentation doesn't mention a limit, so possibly you could set this absurdly high to make all of swap be read back in really soon. And of course turn it back to a sane value afterwards.

Solution 3

It seems to me that you can't magically "make the system responsive again". You either incur the penalty or reading pages back from swap space into memory now or you incur it later, but one way or the other you incur it. Indeed, if you do something like swapoff -a && swapon -a then you may feel more pain rather than less, because you force some pages to be copied back into memory that would otherwise have never been needed again and eventually dropped without being read (think: you quit an application while much of its heap is swapped out; those pages can be discarded altogether without ever getting read back in to memory).

but this clears the pages from swap, so they need to be written again the next time I run the script.

Well, pretty much any page that gets copied back from swap into main memory is about to be modified anyway, so if it ever needed to be moved back out to swap again in the future, it would have to be written anew in swap anyway. Keep in mind that swap is mainly heap memory, not read-only pages (which are usually file-backed).

I think your swapoff -a && swapon -a trick is as good as anything you could come up with.

Solution 4

You may try adding the programs you most care about to a cgroup and tuning swappiness so that the next time the application runs the programs you add are less likely to be candidates for swapping.

Some of their pages will likely still be swapped out but it may get around your performance problems. A large part of it is probably just the "stop and start" behavior when a lot of a program's pages are in swap and the program has to continually pause in order swap its pages into RAM but only in 4k increments.

Alternatively, you may add the application that's running to a cgroup and tune swappiness so that the application is the one that tends to use the swap file most. It'll slow down the application but it'll spare the rest of the system.

Share:
5,756

Related videos on Youtube

drrossum
Author by

drrossum

Updated on September 18, 2022

Comments

  • drrossum
    drrossum over 1 year

    The Linux kernel swaps out most pages from memory when I run an application that uses most of the 16GB of physical memory. After the application finishes, every action (typing commands, switching workspaces, opening a new web page, etc.) takes very long to complete because the relevant pages first need to be read back in from swap.

    Is there a way to tell the Linux kernel to copy pages from swap back into physical memory without manually touching (and waiting for) each application? I run lots of applications so the wait is always painful.

    I often use swapoff -a && swapon -a to make the system responsive again, but this clears the pages from swap, so they need to be written again the next time I run the script.

    Is there a kernel interface, perhaps using sysfs, to instruct the kernel to read all pages from swap?

    Edit: I am indeed looking for a way to make all of swap swapcached. (Thanks derobert!)

    [P.S. serverfault.com/questions/153946/… and serverfault.com/questions/100448/… are related topics but do not address the question of how to get the Linux kernel to copy pages from swap back into memory without clearing swap.]

    • Admin
      Admin about 9 years
      You want all of swap to be a cache of the system's memory? So you want an image of the system's memory which you can reload at will? This is basically how hibernation works - the system images its memory to disk, powers off, and restores the image at power up. Is there any chance, do you think, that at a query following that thread might be helpful to you? For example, if you were to image your memory, disable swap, complete a task, then restore the image and repeat - is that something which you might like to do?
    • Admin
      Admin about 9 years
      I don't think this would leave swap in a swapcached state. As such it seems to me that your suggestion is an alternative for the swapoff-swapon method.
    • Admin
      Admin almost 3 years
      Hey @drrossum did you ever find out more useful information about this subject? I'm finding out that certain applications unfortunately require swap to be used, it's not avoidable, and sadly these Swap methods on Linux are mostly obsolete since they were created with older slower disks in mind and were never really updated. This is a field that requires increased attention. Not only swapping as it currently stands is super slow, but I suspect it causes thrashing of disks as well. Unnecessary premature death of disks due to the constant I/O rw.
    • Admin
      Admin almost 3 years
      There's an "agreement" with the Linux community that says "Just trust the kernel developers that deal with Out-Of-Memory Killer (OOM Killer), they know better." I don't think that's the case anymore, I don't particularly agree with that way of thinking. Linux doesn't need to settle for mediocrity. When even something bloated as Windows 10 can use disk swapping in a much faster manner, I'm sorry but I have to say something is deeply wrong with this.
  • goldilocks
    goldilocks about 9 years
    That's two of us saying exactly the same thing at the same time ;)
  • Celada
    Celada about 9 years
    @goldilocks yeah, I saw your answer appear before mine was ready but I was already ¾ done so I stuck with it :-)
  • drrossum
    drrossum about 9 years
    you and goldilocks say the same thing but I don't believe this is how swap caching works. My understanding is that you can have pages in swap AND memory at the same time. The swap page only gets invalidated once the page in memory is updated.
  • Celada
    Celada about 9 years
    I trust that the answer you referred to by David Spillett is correct: you can indeed have a page in swap and RAM at the same time... but only until the RAM version gets modified. Then you have to discard the out-of-date copy in swap. When I said "pretty much any page that gets copied back from swap [...] is about to be modified anyway" what I mean is that I expect that this is what happens most of the time, so I don't expect pages in both places to be a significant fraction worth worrying about.
  • Celada
    Celada about 9 years
    Your usage scenario may be different: you may have a lot of applications with big heaps that frequently get read and not written. My feeling is that most people don't have such a scenario. But if you do, I guess you're right: swapoff -a && swapon -a won't be good for you. I guess in that case you'd need something that scans /proc/<each-process>/mem and reads each page of memory to make sure it exists in RAM. Don't know if that exists.
  • Celada
    Celada about 9 years
    I'm sorry for the misunderstanding. I didn't mean to suggest that every page that is loaded back from swap into RAM is guaranteed to be modified right away and therefore expunged from swap.
  • drrossum
    drrossum about 9 years
    This sounds like a useful workaround. The two manual interventions could be combined with a sleep command to make it a single user intervention. But, it may not necessarily make all swap swapcached very quickly as it only reads consecutively from the page that is accessed. It's the best solution so far, though. Thanks!
  • Bratchley
    Bratchley about 9 years
    tbh this is probably the best solution you're going to get. I haven't heard of it before but it looks like it turns the swap-in into a series of large IOP's rather than a continuous set of small IOPs which is probably what's causing your performance issues. I would be legitimately surprised if there was something that perfectly addressed your individual situation.
  • Ilmari Karonen
    Ilmari Karonen about 9 years
    For that matter, if you're experiencing slowdowns due to lots of consecutive small swap-ins, even just permanently adjusting the page-cluster value up might improve performance.
  • slm
    slm about 9 years
    Whilst this may theoretically answer the question, it would be preferable to include the essential parts of the answer here, and provide the link for reference.
  • flaschenpost
    flaschenpost over 7 years
    The only problem with that solution is: swapoff is really really slow. Much slower than just reading 2 GB sequentially from disc.
  • drrossum
    drrossum over 7 years
    This tool does exactly what I was looking for. Thanks!
  • Giovanni Mascellani
    Giovanni Mascellani about 7 years
    One nice thing is that, from my observation, swapped pages get copied in RAM, but they are not deleted from the swap (at least, most of them are not, because the swap usage only decreases slightly). My interpretation is that Linux keeps two copy of each page, one in RAM and one in the swap. If this is handled correctly, it is even better than cancelling and adding swap again, because it means that, when a double page will have to be swapped again, there will be no need for another copy. Thanks to any kernel expert who can confirm.
  • ysalmon
    ysalmon almost 4 years
    I tried to use your script today. It reports syntactic errors (dots at the end of some lines). After correcting them it runs very quickly but does not seem to do anything (no change in swap usage at 3.2 GiB). Am I missing something ?
  • Vi.
    Vi. almost 4 years
    @ysalmon, Reading swap into memory does not mean removing it from swap. The page can be both in swap and in memory. This hovewer should make programs be ready to go, also swapoff should be faster after that. "it runs very quickly but does not seem to do anything" -> Have you specified the list of PIDs as command line arguments?
  • Winampah
    Winampah almost 3 years
    Hey @GiovanniMascellani I'm finding out that there are very few people with real expertise and experience in this swapping subject. Unfortunately, there are several applications that rely on swapping to even startup (DotNet, C#, Java, MySQL, etc) Did you ever stumble upon the problem of the command Swapoff running terribly slow? I'm getting 1.90 MB/s when I run the swapoff command. There's something wrong with I/O settings but I'm having difficulty in diagnosing it. Maybe you can help giving some directions? I created a question with more details. This has been plaguing me for a few months.
  • Winampah
    Winampah almost 3 years
    But adjusting in what direction? Bigger value? Lower value? Can lower values mean that there's the danger of disk thrashing? Meaning: can lower values increase the likability of damage to the disk, or reduction of its lifespan due to too much I/O?
  • Winampah
    Winampah almost 3 years
    I'm doing research on this subject to help diagnose an I/O slowness problem I'm having. The weird part is that this slowness only happens in Debian, it does not happen if I reboot into Manjaro for example. But anyway... about this "4k increments" part... Is all disk Swapping done in 4K increments? Is that value fixed? Can that value be tuned? For several frequent small swapping operations, would a higher increment size help?
  • Giovanni Mascellani
    Giovanni Mascellani almost 3 years
    @Winampah Sorry, I don't really have deep expertise in this field, and I don't know how to fix your problem. Good luck!