Transparent compression filesystem in conjunction with ext4
Solution 1
I use ZFS on Linux as a volume manager and a means to provide additional protections and functionality to traditional filesystems. This includes bringing block-level snapshots, replication, deduplication, compression and advanced caching to the XFS or ext4 filesystems.
See: https://pthree.org/2012/12/21/zfs-administration-part-xiv-zvols/ for another explanation.
In my most common use case, I leverage the ZFS zvol feature to create a sparse volume on an existing zpool. That zvol's properties can be set just like a normal ZFS filesystem's. At this juncture, you can set properties like compression type, volume size, caching method, etc.
Creating this zvol presents a block device to Linux that can be formatted with the filesystem of your choice. Use fdisk
or parted
to create your partition and mkfs
the finished volume.
Mount this and you essentially have a filesystem backed by a zvol and with all of its properties.
Here's my workflow...
Create a zpool comprised of four disks:
You'll want the ashift=12
directive for the type of disks you're using. The zpool name is "vol0" in this case.
zpool create -o ashift=12 -f vol0 mirror scsi-AccOW140403AS1322043 scsi-AccOW140403AS1322042 mirror scsi-AccOW140403AS1322013 scsi-AccOW140403AS1322044
Set initial zpool settings:
I set autoexpand=on
at the zpool level in case I ever replace the disks with larger drives or expand the pool in a ZFS mirrors setup. I typically don't use ZFS raidz1/2/3 because of poor performance and the inability to expand the zpool.
zpool set autoexpand=on vol0
Set initial zfs filesystem properties:
Please use the lz4
compression algorithm for new ZFS installations. It's okay to leave it on all the time.
zfs set compression=lz4 vol0
zfs set atime=off vol0
Create ZFS zvol:
For ZFS on Linux, it's very important that you use a large block size. -o volblocksize=128k
is absolutely essential here. The -s
option creates a sparse zvol and doesn't consume pool space until it's needed. You can overcommit here, if you know your data well. In this case, I have about 444GB of usable disk space in the pool, but I'm presenting an 800GB volume to XFS.
zfs create -o volblocksize=128K -s -V 800G vol0/pprovol
Partition zvol device:
(should be /dev/zd0 for the first zvol; /dev/zd16, /dev/zd32, etc. for subsequent zvols)
fdisk /dev/zd0 # (create new aligned partition with the "c" and "u" parameters)
Create and mount the filesystem:
mkfs.xfs or ext4 on the newly created partition, /dev/zd0p1.
mkfs.xfs -f -l size=256m,version=2 -s size=4096 /dev/zd0p1
Grab the UUID with blkid
and modify /etc/fstab
.
UUID=455cae52-89e0-4fb3-a896-8f597a1ea402 /ppro xfs noatime,logbufs=8,logbsize=256k 1 2
Mount the new filesystem.
mount /ppro/
Results...
[root@Testa ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sde2 20G 8.9G 9.9G 48% /
tmpfs 32G 0 32G 0% /dev/shm
/dev/sde1 485M 63M 397M 14% /boot
/dev/sde7 2.0G 68M 1.9G 4% /tmp
/dev/sde3 12G 2.6G 8.7G 24% /usr
/dev/sde6 6.0G 907M 4.8G 16% /var
/dev/zd0p1 800G 398G 403G 50% /ppro <-- Compressed ZFS-backed XFS filesystem.
vol0 110G 256K 110G 1% /vol0
ZFS filesystem listing.
[root@Testa ~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
vol0 328G 109G 272K /vol0
vol0/pprovol 326G 109G 186G - <-- The actual zvol providing the backing for XFS.
vol1 183G 817G 136K /vol1
vol1/images 183G 817G 183G /images
ZFS zpool list.
[root@Testa ~]# zpool list -v
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
vol0 444G 328G 116G 73% 1.00x ONLINE -
mirror 222G 164G 58.1G -
scsi-AccOW140403AS1322043 - - - -
scsi-AccOW140403AS1322042 - - - -
mirror 222G 164G 58.1G -
scsi-AccOW140403AS1322013 - - - -
scsi-AccOW140403AS1322044 - - - -
ZFS zvol properties (take note of referenced
, compressratio
and volsize
).
[root@Testa ~]# zfs get all vol0/pprovol
NAME PROPERTY VALUE SOURCE
vol0/pprovol type volume -
vol0/pprovol creation Sun May 11 15:27 2014 -
vol0/pprovol used 326G -
vol0/pprovol available 109G -
vol0/pprovol referenced 186G -
vol0/pprovol compressratio 2.99x -
vol0/pprovol reservation none default
vol0/pprovol volsize 800G local
vol0/pprovol volblocksize 128K -
vol0/pprovol checksum on default
vol0/pprovol compression lz4 inherited from vol0
vol0/pprovol readonly off default
vol0/pprovol copies 1 default
vol0/pprovol refreservation none default
vol0/pprovol primarycache all default
vol0/pprovol secondarycache all default
vol0/pprovol usedbysnapshots 140G -
vol0/pprovol usedbydataset 186G -
vol0/pprovol usedbychildren 0 -
vol0/pprovol usedbyrefreservation 0 -
vol0/pprovol logbias latency default
vol0/pprovol dedup off default
vol0/pprovol mlslabel none default
vol0/pprovol sync standard default
vol0/pprovol refcompressratio 3.32x -
vol0/pprovol written 210M -
vol0/pprovol snapdev hidden default
Solution 2
You also need to enable discard on the ext4 filesystem. Without discard, zfs does not reclaim the space when files are removed. This can end up leading to large space discrepancies between what the ext4 filesystem reports and the zfs volume reports.
Related videos on Youtube
user235918
Updated on September 18, 2022Comments
-
user235918 over 1 year
I am trying to test a project that needs compressed storage with use of the ext4 file system since the application I use relies on ext4 features.
Are there any production/stable solutions out there for transparent compression on ext4?
What I have tried:
Ext4 over ZFS volume with compression enabled. This actually had an adverse affect. I tried creating a ZFS volume with lz4 compression enabled and making an ext4 filesystem on /dev/zvol/... but the zfs volume showed double the actual usage and the compression did not seem to have any effect.
# du -hs /mnt/test **1.1T** /mnt/test # zfs list NAME USED AVAIL REFER MOUNTPOINT pool 15.2T 2.70G 290K /pool pool/test 15.2T 13.1T **2.14T** -
ZFS Creation Commands
zpool create pool raidz2 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde2 /dev/sdf1 /dev/sdg1 /dev/sdh2 /dev/sdi1 zfs set recordsize=128k pool zfs create -p -V15100GB pool/test zfs set compression=lz4 pool/test mkfs.ext4 -m1 -O 64bit,has_journal,extents,huge_file,flex_bg,uninit_bg,dir_nlink /dev/zvol/pool/test
Fusecompress: Seemed to work but not 100% stable. Looking for alternatives.
LessFS: Is it possible to use Lessfs in conjunction with ext4? I have not yet tried but would be interested in user insight.
One major problem: not true transparency
An issue I saw with fusecompress was quotas. For example, if I enabled compression on the filesystem, I would want my system to benefit from the compression, not necessarily the end user. If I enabled a quota of 1GB for a user, with a compression ratio of 1.5, they would be able to upload 1.5GB of data, rather than 1GB of data and the system benefiting from the compression. This also appeared to show on df -h. Is there a solution to have compression transparent to quotas?
-
ewwhite almost 10 yearsSure. Can you please list the OS/distro/version and details about the nature of the data you intend to store?
-
ewwhite almost 10 yearsAlso hardware details.
-
user235918 almost 10 years@ewwhite 8x3TB in a Software RAID6. Data will be rsynced backups from other servers so mixed data types and various end users, documents, etc. CentOS 6.5 x64.
-
Andrew Schulman almost 10 yearsAre you sure you need this? Do you have many large, sparse files? Disk space is cheap these days.
-
user235918 almost 10 years@AndrewSchulman: Taking advantage of compression is the better method from my calculation. The cost of extra disks and controllers that support them are more than the cost of CPU.
-
ewwhite almost 10 years@user235918 Also, to show the pool's compression ratio, can you give the output of
zfs get compressratio pool/test
? -
user235918 almost 10 years@ewwhite Yes I had already tested that. The compressratio is only 1.06X.
-
user235918 almost 10 years@ewwhite I tested with a .tar file that was previously compressed from 633M to 216MB with gzip. It did not compress at all on the ext4 filesystem. It only compressed to 529M on a ZFS mount with lz4. So something with ext4 isn't working correctly, but I haven't tried volblocksize or anything from your post yet.
-
ewwhite almost 10 years@user235918 You're not going to be able to compress a file like a .gzip archive any further, assuming that's how you moved it. On ext4 and on top of a ZFS zvol, you won't see ANY compression. The file sizes will be their native sizes, but their space on disk will be smaller. It's transparent to the OS/applications. You'd only be able to see it in the zpool/zfs/compressratio figures.
-
user235918 almost 10 years@ewwhite Well I'm not sure what did it. Whether it was the sparse volume or the volblocksize but it appears to be working as expected now. I just created a new sparse volume and tested it by moving a bunch of files over. It is showing a compression ratio above 2 now and the size in zfs list is actually half the size of what du reports. I appreciate your help!
-
user235918 almost 10 years@ewwhite Yeah I think the sparse volume fixed the ext4 issue and the volblocksize increased the compression because even before the tar file was only 529M on ZFS/LZ4. Now it is 334M which is about what I expected compared to the 216M of GZIP6.
-
ewwhite almost 10 years@user235918 Glad this helped. There are also some settings you'll want in
/etc/modprobe/zfs.conf
- Which version of ZFS on Linux did you download? 6.2 or 6.3? -
user235918 almost 10 years@ewwhite I am using 6.3. With CentOS it is usually modprobe.d where the files are stored but I didn't have any zfs.conf there.
-
ewwhite almost 10 yearsRight. There are some values you'll want to put there.
-
user235918 almost 10 years@ewwhite. I did some searching online and came up with the zfs_arc_max. Is this what you're talking about? If so, I did some reading up on it, so thanks for pointing me in the right direction.
-
-
Michael Hampton almost 10 yearsWhy partition the zvol? Can't it just be used directly?
-
ewwhite almost 10 years@MichaelHampton Mainly for alignment and consistency. Also, I want flexibility if I expand the underlying volume. There are several layers of abstraction here. It's similar to the argument of using
/dev/sdb
versus/dev/sdb1
. -
user235918 almost 10 yearsThanks for your information. A lot of good advice in here. I'm going to test it out.
-
ewwhite almost 10 yearsRed Hat doesn't recommend doing this online with the discard mount option (with ext4 or xfs), as there's a performance impact. It's cleaner to periodically run the
fstrim
command. -
ewwhite over 8 years@MichaelHampton BTW, these days, I don't partition anymore... especially with virtual machines.
-
Stoat over 8 yearswrt the comment about discard mounts impacting performance: This is true with old, low quality SSDs. It's not true with newer ones.
-
shodanshok over 7 yearsDo you continue to use 128K volblocksize? With such a large block size, each small (ie: 4K) write will trigger a read-modify-checksum-write operation, wasting I/O bandwidth. There are any specific reasons to use such a large block size?
-
ewwhite over 7 yearsFor my workloads, testing at various block sizes down to 8k showed no appreciable differences in performance. Compression rate was much better at 64k and above.
-
Sz. over 6 yearsCan you please tell about the additional resource costs for the ZFS layer in this setup (RAM, CPU)?
-
Znik over 6 yearsYou forgot something about deduplication. It uses a lot of RAM for storing information about deduplicated blocks, because ZFS uses online deduplication algorithm. It os not possible write fiels first, then when system is idle, to make offline deduplication. Only one possible way is switch off deduplication flag on the day, then when system is idle, switch deduplication on, and complete rewrite all fresh files. When you switch deduplication off again, all deduplicated data will be still deduplicated. You can use SSD as write cache for preventing big RAM usage, but this is workaround.
-
jimp over 5 yearsMany guides exist that say ZFS needs 8GB of RAM just for its base level requirements. I'd like to use this with VMs, but most only need 2-4GB of RAM. Can this be used in low RAM setups without a serious performance penalty?