How to set up properly zram and swap

linux swap zram virtual-memory

22,408

Solution 1

swapon have -p switch which sets the priority. I can set up:

swapon -p 32767 /dev/zram0
swapon -p 0 /dev/my-lvm-volume/swap

Or in /etc/fstab:

/dev/zram0              none swap sw,pri=32767 0 0
/dev/my-lvm-volume/swap none swap sw,pri=0     0 0

EDIT: Just for a full solution - such line may be helpful as udev rule:

KERNEL=="zram0", ACTION=="add", ATTR{disksize}="1073741824", RUN="/sbin/mkswap /$root/$name"

Solution 2

For some reason there seems to be a lot of misinterpretation of https://www.kernel.org/doc/Documentation/blockdev/zram.txt

It clearly states:

2) Set max number of compression streams
Regardless the value passed to this attribute, ZRAM will always allocate multiple compression streams - one per online CPUs - thus allowing several concurrent compression operations. The number of allocated compression streams goes down when some of the CPUs become offline. There is no single-compression-stream mode anymore, unless you are running a UP system or has only 1 CPU online.

To find out how many streams are currently available:
cat /sys/block/zram0/max_comp_streams

But there is a common, persistent urban myth that max streams is 1.

It's plainly not true.

The two OSs where zram has proven effective Chrome OS & Android you a single device. Also they tweak page-cluster:

page-cluster controls the number of pages up to which consecutive pages are read in from swap in a single attempt. This is the swap counterpart to page cache readahead.
The mentioned consecutivity is not in terms of virtual/physical addresses, but consecutive on swap space – that means they were swapped out together.

It is a logarithmic value – setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. Zero disables swap readahead completely.

The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive.

Lower values mean lower latencies for initial faults, but at the same time extra faults and I/O delays for following faults if they would have been part of that consecutive pages readahead would have brought in.

— from the kernel documentation for /proc/sys/vm/*

So use echo "0" > /proc/sys/vm/page-cluster to force single page.

Much seems to originate from zram_config the debian/ubuntu package that for some reason seems to have very little correlation with the kernel documents for zram and has bred a series of Chinese whispers that in essence could be completely wrong.

With file swap do you create a swap drive for each core? Maybe that might answer your questions. Also to back this up Googles Chrome OS & Android which successfully employ with the above page-cluster as its not matching a disk so latency can be improved, single devices.

Also for a sys-admin what is important actual mem usage or vm mem usage? Most examples show creation via disk_size and totally ignore mem_limit. disk_size=uncompressed vm size. mem_limit=actual mem footprint limit.

It sort of makes disk_size choice confusing as its a virtual max size dependent on comp_alg ratio and the overhead of 0.1% of the size of the disk when not in use and really is a guesstimate of mem_limit * (approx 2 - 4) of frugal vs optimism.

zram_config doesn't even check for previous service usage and overwrites whilst a simple check of the zram sys class like the below will.

createZramSwaps () {
        totalmem=$(free|awk '/^Mem:/{print $2}')
        mem=$((( totalmem * MEM_FACTOR / 100 / BIG_CORES ) * 1024))

        # Check Zram Class created
        ZRAM_SYS_DIR='/sys/class/zram-control'
        if [ ! -d "${ZRAM_SYS_DIR}" ]; then
                modprobe zram
                RAM_DEV='0'
                echo ${COMP_ALG_SWAP} > /sys/block/zram${RAM_DEV}/comp_algorithm
                echo ${mem} > /sys/block/zram${RAM_DEV}/disksize
                mkswap /dev/zram${RAM_DEV}
                swapon -p ${SWAP_PRI} /dev/zram${RAM_DEV}
        else
                RAM_DEV=$(cat /sys/class/zram-control/hot_add)
                echo ${COMP_ALG_SWAP} > /sys/block/zram${RAM_DEV}/comp_algorithm
                echo ${mem} > /sys/block/zram${RAM_DEV}/disksize
                mkswap /dev/zram${RAM_DEV}
                swapon -p ${SWAP_PRI} /dev/zram${RAM_DEV}
        fi

        if [ "$BIG_CORES" -gt 1 ];then
                for i in $(seq $((BIG_CORES - 1))); do
                        RAM_DEV=$(cat /sys/class/zram-control/hot_add)
                        echo ${COMP_ALG_SWAP} > /sys/block/zram${RAM_DEV}/comp_algorithm
                        echo ${mem} > /sys/block/zram${RAM_DEV}/disksize
                        mkswap /dev/zram${RAM_DEV}
                        swapon -p ${SWAP_PRI} /dev/zram${RAM_DEV}
                done
        fi
}

Solution 3

Sidenote: because of per-cpu locking, it is important to have as many zram-swaps as CPUs (modprobe zram_num_devices=n zram) instead of a single big one. RTFM!

Solution 4

Select how many zram devices do you want by passing

zram.num_devices=4

to kernel parameters or directly to module (without zram.) . Default value is 1. zram will be able utilize as many cpus as number of devices you create.

Prepare zram0 device, set size to 1GB.

echo 1000000000 > /sys/block/zram0/disksize

Alternatively you can do it by udev rule. Create swap on zram0

mkswap /dev/zram0

and enable it with higher priority than other swap devices

swapon -p 32767 /dev/zram0

View more solutions

22,408

Author by

Louis Gerbarg

Updated on September 18, 2022

Comments

Louis Gerbarg over 1 year

I'm configuring & compiling new 3.0 kernel. One of the goodies I planned to use for some time (by patching) that was merged into 3.0 is zram.

Is it possible to set both hdd swap and zram swap so the zram is used first and only spilled pages are put into actual swap?
- Winampah about 3 years
  
  Hello, I'm finding that it's very hard to find people that are "experts" with REAL Experience in this subject of disk swapping, Zram, etc. Did you end up finding an optimal method of doing this over the years? I stumbled upon a problem where disk swapping is very slow on Debian, but it's super fast on Manjaro, in the same machine. I'm trying to figure out why. But that's a secondary subject. What's your Zram/Zswap procedure these days? Is one better than the other?
- Winampah about 3 years
  
  There is this game I play that is very heavy and it depends on swap at startup, called Cities Skylines. There are others that report similar problems with Java applications (lack of heap_size limitation in Java) and also people who have to deal with MySQL servers (databases consume more and more memory over the span of a few hours of operation, and reboots are not really a possibility) For all these use-cases, the current Swapping methods in Linux are obsolete, they were designed for old disks and were never updated. DotNet and C# applications are notorious for memory usage.
- Winampah about 3 years
  
  I have experimented with Zram in the past, haven't tried Zswap yet. But the problem is that I found out later on that Zram ends up using double the space in RAM, since BOTH the compressed and the uncompressed swap pages sit on RAM? Or maybe I misread this. I notice no particular improvement by using Zram, so I ended up not using it anymore. Maybe I just needed to find out how to tune it better.
Louis Gerbarg over 12 years

[citation needed]? I cannot find the recomendation in Linux documentation or Google.
Louis Gerbarg about 9 years

By default max_comp_streams seems to be 1 according to documentation. I haven't checked if that parameter existed in 2011 but it seems to be a better option then multiple swap files.
Elder Geek over 8 years

Testing with the package zram-config under Ubuntu (and likely debian) I noted that the configuration defaults to 1 per core or thread (2 on an old E2140 dual core and 4 on an i3-3220)
Cbhihe almost 8 years

What kernel and distro versions are you writing for here ?
Cbhihe almost 8 years

For the record, proceeding as you advise on a 14.04.5 desktop Ubuntu, fails. In particular, when I try to set the priorities of zram0 and zram1 to value 10 on a 2 nuclei Intel T9300 box, it throws me with: swapon: /dev/zram0 or 1: swapon failed: Device or resource busy.
Aquarius Power over 7 years

interesting, and this also seems quite clear to understand by reading /etc/init/zram-config.conf
dma_k over 4 years

This article suggests to have one zram partition.
Perkins over 4 years

This sidenote is obsolete as of kernel version 3.15. You can now have multiple compression threads for a single, large zram.
Winampah about 3 years

Interesting info, thanks for sharing. I have one concern though: Could forcing page cluster to use single page (setting it to 0) cause too much I/O on disk, potentially stressing the disk and leading to the reduction of its lifespan? Would that cause additional mechanical stress on spinning platter disks? And what would be the consequence of setting that value too high? System freezes?
Stuart Naylor about 2 years

Your swap is no longer disk its ram you do still need to have a disk swap as a failover as zram will sometimes hit something too large that is uncompressable. The page cluster is for swap and doesn't alter other activities and for the occasional failover it has a mismatched batching scheme but that is far better than having a mismatched batching scheme on you main swap aka ram.