Transparent compression filesystem in conjunction with ext4

55,108

Solution 1

I use ZFS on Linux as a volume manager and a means to provide additional protections and functionality to traditional filesystems. This includes bringing block-level snapshots, replication, deduplication, compression and advanced caching to the XFS or ext4 filesystems.

See: https://pthree.org/2012/12/21/zfs-administration-part-xiv-zvols/ for another explanation.

In my most common use case, I leverage the ZFS zvol feature to create a sparse volume on an existing zpool. That zvol's properties can be set just like a normal ZFS filesystem's. At this juncture, you can set properties like compression type, volume size, caching method, etc.

Creating this zvol presents a block device to Linux that can be formatted with the filesystem of your choice. Use fdisk or parted to create your partition and mkfs the finished volume.

Mount this and you essentially have a filesystem backed by a zvol and with all of its properties.


Here's my workflow...

Create a zpool comprised of four disks:
You'll want the ashift=12 directive for the type of disks you're using. The zpool name is "vol0" in this case.

zpool create -o ashift=12 -f vol0 mirror scsi-AccOW140403AS1322043 scsi-AccOW140403AS1322042 mirror scsi-AccOW140403AS1322013 scsi-AccOW140403AS1322044

Set initial zpool settings:
I set autoexpand=on at the zpool level in case I ever replace the disks with larger drives or expand the pool in a ZFS mirrors setup. I typically don't use ZFS raidz1/2/3 because of poor performance and the inability to expand the zpool.

zpool set autoexpand=on vol0

Set initial zfs filesystem properties:
Please use the lz4 compression algorithm for new ZFS installations. It's okay to leave it on all the time.

zfs set compression=lz4 vol0
zfs set atime=off vol0

Create ZFS zvol:
For ZFS on Linux, it's very important that you use a large block size. -o volblocksize=128k is absolutely essential here. The -s option creates a sparse zvol and doesn't consume pool space until it's needed. You can overcommit here, if you know your data well. In this case, I have about 444GB of usable disk space in the pool, but I'm presenting an 800GB volume to XFS.

zfs create -o volblocksize=128K -s -V 800G vol0/pprovol

Partition zvol device:
(should be /dev/zd0 for the first zvol; /dev/zd16, /dev/zd32, etc. for subsequent zvols)

fdisk /dev/zd0 # (create new aligned partition with the "c" and "u" parameters)

Create and mount the filesystem:
mkfs.xfs or ext4 on the newly created partition, /dev/zd0p1.

mkfs.xfs -f -l size=256m,version=2 -s size=4096 /dev/zd0p1

Grab the UUID with blkid and modify /etc/fstab.

UUID=455cae52-89e0-4fb3-a896-8f597a1ea402 /ppro       xfs     noatime,logbufs=8,logbsize=256k 1 2

Mount the new filesystem.

mount /ppro/

Results...

[root@Testa ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sde2        20G  8.9G  9.9G  48% /
tmpfs            32G     0   32G   0% /dev/shm
/dev/sde1       485M   63M  397M  14% /boot
/dev/sde7       2.0G   68M  1.9G   4% /tmp
/dev/sde3        12G  2.6G  8.7G  24% /usr
/dev/sde6       6.0G  907M  4.8G  16% /var
/dev/zd0p1      800G  398G  403G  50% /ppro  <-- Compressed ZFS-backed XFS filesystem.
vol0            110G  256K  110G   1% /vol0

ZFS filesystem listing.

[root@Testa ~]# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
vol0           328G   109G   272K  /vol0
vol0/pprovol   326G   109G   186G  -   <-- The actual zvol providing the backing for XFS.
vol1           183G   817G   136K  /vol1
vol1/images    183G   817G   183G  /images

ZFS zpool list.

[root@Testa ~]# zpool list -v
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vol0   444G   328G   116G    73%  1.00x  ONLINE  -
  mirror   222G   164G  58.1G         -
    scsi-AccOW140403AS1322043      -      -      -         -
    scsi-AccOW140403AS1322042      -      -      -         -
  mirror   222G   164G  58.1G         -
    scsi-AccOW140403AS1322013      -      -      -         -
    scsi-AccOW140403AS1322044      -      -      -         -

ZFS zvol properties (take note of referenced, compressratio and volsize).

[root@Testa ~]# zfs get all vol0/pprovol
NAME          PROPERTY               VALUE                  SOURCE
vol0/pprovol  type                   volume                 -
vol0/pprovol  creation               Sun May 11 15:27 2014  -
vol0/pprovol  used                   326G                   -
vol0/pprovol  available              109G                   -
vol0/pprovol  referenced             186G                   -
vol0/pprovol  compressratio          2.99x                  -
vol0/pprovol  reservation            none                   default
vol0/pprovol  volsize                800G                   local
vol0/pprovol  volblocksize           128K                   -
vol0/pprovol  checksum               on                     default
vol0/pprovol  compression            lz4                    inherited from vol0
vol0/pprovol  readonly               off                    default
vol0/pprovol  copies                 1                      default
vol0/pprovol  refreservation         none                   default
vol0/pprovol  primarycache           all                    default
vol0/pprovol  secondarycache         all                    default
vol0/pprovol  usedbysnapshots        140G                   -
vol0/pprovol  usedbydataset          186G                   -
vol0/pprovol  usedbychildren         0                      -
vol0/pprovol  usedbyrefreservation   0                      -
vol0/pprovol  logbias                latency                default
vol0/pprovol  dedup                  off                    default
vol0/pprovol  mlslabel               none                   default
vol0/pprovol  sync                   standard               default
vol0/pprovol  refcompressratio       3.32x                  -
vol0/pprovol  written                210M                   -
vol0/pprovol  snapdev                hidden                 default

Solution 2

You also need to enable discard on the ext4 filesystem. Without discard, zfs does not reclaim the space when files are removed. This can end up leading to large space discrepancies between what the ext4 filesystem reports and the zfs volume reports.

Share:
55,108

Related videos on Youtube

user235918
Author by

user235918

Updated on September 18, 2022

Comments

  • user235918
    user235918 over 1 year

    I am trying to test a project that needs compressed storage with use of the ext4 file system since the application I use relies on ext4 features.

    Are there any production/stable solutions out there for transparent compression on ext4?

    What I have tried:

    Ext4 over ZFS volume with compression enabled. This actually had an adverse affect. I tried creating a ZFS volume with lz4 compression enabled and making an ext4 filesystem on /dev/zvol/... but the zfs volume showed double the actual usage and the compression did not seem to have any effect.

    # du -hs /mnt/test
    **1.1T**    /mnt/test
    # zfs list
    NAME        USED  AVAIL  REFER  MOUNTPOINT
    pool       15.2T  2.70G   290K  /pool
    pool/test  15.2T  13.1T  **2.14T**  -
    

    ZFS Creation Commands

    zpool create pool raidz2 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde2 /dev/sdf1 /dev/sdg1 /dev/sdh2 /dev/sdi1
    zfs set recordsize=128k pool
    zfs create -p -V15100GB pool/test
    zfs set compression=lz4 pool/test
    mkfs.ext4 -m1 -O 64bit,has_journal,extents,huge_file,flex_bg,uninit_bg,dir_nlink /dev/zvol/pool/test
    

    Fusecompress: Seemed to work but not 100% stable. Looking for alternatives.

    LessFS: Is it possible to use Lessfs in conjunction with ext4? I have not yet tried but would be interested in user insight.

    One major problem: not true transparency

    An issue I saw with fusecompress was quotas. For example, if I enabled compression on the filesystem, I would want my system to benefit from the compression, not necessarily the end user. If I enabled a quota of 1GB for a user, with a compression ratio of 1.5, they would be able to upload 1.5GB of data, rather than 1GB of data and the system benefiting from the compression. This also appeared to show on df -h. Is there a solution to have compression transparent to quotas?

    • ewwhite
      ewwhite almost 10 years
      Sure. Can you please list the OS/distro/version and details about the nature of the data you intend to store?
    • ewwhite
      ewwhite almost 10 years
      Also hardware details.
    • user235918
      user235918 almost 10 years
      @ewwhite 8x3TB in a Software RAID6. Data will be rsynced backups from other servers so mixed data types and various end users, documents, etc. CentOS 6.5 x64.
    • Andrew Schulman
      Andrew Schulman almost 10 years
      Are you sure you need this? Do you have many large, sparse files? Disk space is cheap these days.
    • user235918
      user235918 almost 10 years
      @AndrewSchulman: Taking advantage of compression is the better method from my calculation. The cost of extra disks and controllers that support them are more than the cost of CPU.
    • ewwhite
      ewwhite almost 10 years
      @user235918 Also, to show the pool's compression ratio, can you give the output of zfs get compressratio pool/test ?
    • user235918
      user235918 almost 10 years
      @ewwhite Yes I had already tested that. The compressratio is only 1.06X.
    • user235918
      user235918 almost 10 years
      @ewwhite I tested with a .tar file that was previously compressed from 633M to 216MB with gzip. It did not compress at all on the ext4 filesystem. It only compressed to 529M on a ZFS mount with lz4. So something with ext4 isn't working correctly, but I haven't tried volblocksize or anything from your post yet.
    • ewwhite
      ewwhite almost 10 years
      @user235918 You're not going to be able to compress a file like a .gzip archive any further, assuming that's how you moved it. On ext4 and on top of a ZFS zvol, you won't see ANY compression. The file sizes will be their native sizes, but their space on disk will be smaller. It's transparent to the OS/applications. You'd only be able to see it in the zpool/zfs/compressratio figures.
    • user235918
      user235918 almost 10 years
      @ewwhite Well I'm not sure what did it. Whether it was the sparse volume or the volblocksize but it appears to be working as expected now. I just created a new sparse volume and tested it by moving a bunch of files over. It is showing a compression ratio above 2 now and the size in zfs list is actually half the size of what du reports. I appreciate your help!
    • user235918
      user235918 almost 10 years
      @ewwhite Yeah I think the sparse volume fixed the ext4 issue and the volblocksize increased the compression because even before the tar file was only 529M on ZFS/LZ4. Now it is 334M which is about what I expected compared to the 216M of GZIP6.
    • ewwhite
      ewwhite almost 10 years
      @user235918 Glad this helped. There are also some settings you'll want in /etc/modprobe/zfs.conf - Which version of ZFS on Linux did you download? 6.2 or 6.3?
    • user235918
      user235918 almost 10 years
      @ewwhite I am using 6.3. With CentOS it is usually modprobe.d where the files are stored but I didn't have any zfs.conf there.
    • ewwhite
      ewwhite almost 10 years
      Right. There are some values you'll want to put there.
    • user235918
      user235918 almost 10 years
      @ewwhite. I did some searching online and came up with the zfs_arc_max. Is this what you're talking about? If so, I did some reading up on it, so thanks for pointing me in the right direction.
  • Michael Hampton
    Michael Hampton almost 10 years
    Why partition the zvol? Can't it just be used directly?
  • ewwhite
    ewwhite almost 10 years
    @MichaelHampton Mainly for alignment and consistency. Also, I want flexibility if I expand the underlying volume. There are several layers of abstraction here. It's similar to the argument of using /dev/sdb versus /dev/sdb1.
  • user235918
    user235918 almost 10 years
    Thanks for your information. A lot of good advice in here. I'm going to test it out.
  • ewwhite
    ewwhite almost 10 years
    Red Hat doesn't recommend doing this online with the discard mount option (with ext4 or xfs), as there's a performance impact. It's cleaner to periodically run the fstrim command.
  • ewwhite
    ewwhite over 8 years
    @MichaelHampton BTW, these days, I don't partition anymore... especially with virtual machines.
  • Stoat
    Stoat over 8 years
    wrt the comment about discard mounts impacting performance: This is true with old, low quality SSDs. It's not true with newer ones.
  • shodanshok
    shodanshok over 7 years
    Do you continue to use 128K volblocksize? With such a large block size, each small (ie: 4K) write will trigger a read-modify-checksum-write operation, wasting I/O bandwidth. There are any specific reasons to use such a large block size?
  • ewwhite
    ewwhite over 7 years
    For my workloads, testing at various block sizes down to 8k showed no appreciable differences in performance. Compression rate was much better at 64k and above.
  • Sz.
    Sz. over 6 years
    Can you please tell about the additional resource costs for the ZFS layer in this setup (RAM, CPU)?
  • Znik
    Znik over 6 years
    You forgot something about deduplication. It uses a lot of RAM for storing information about deduplicated blocks, because ZFS uses online deduplication algorithm. It os not possible write fiels first, then when system is idle, to make offline deduplication. Only one possible way is switch off deduplication flag on the day, then when system is idle, switch deduplication on, and complete rewrite all fresh files. When you switch deduplication off again, all deduplicated data will be still deduplicated. You can use SSD as write cache for preventing big RAM usage, but this is workaround.
  • jimp
    jimp over 5 years
    Many guides exist that say ZFS needs 8GB of RAM just for its base level requirements. I'd like to use this with VMs, but most only need 2-4GB of RAM. Can this be used in low RAM setups without a serious performance penalty?