How can I tell if ZFS (zfs-fuse) dedup/compression is applied to a particular file?
You can get deduplication overall statistics with the zdb -D poolname
command.
For per file compression status, it's not very straightforward but you might use this:
zdb dataset | grep plain
This will output lines looking like these ones:
8 2 16K 128K 3.03M 5.00M 100.00 ZFS plain file
9 2 16K 128K 3.03M 5.00M 100.00 ZFS plain file
10 2 16K 128K 5.00M 5.00M 100.00 ZFS plain file
11 2 16K 128K 3.03M 6.00M 83.33 ZFS plain file
The first column is the inode number, column 5 and 6 represent the size on disk and the file size, and column 7 the percentage of the file that really exists. Any file with different values in 6 and 7 and 100% as 8 are compressed.
Related videos on Youtube
asari
Updated on September 17, 2022Comments
-
asari over 1 year
I have a zfs formatted partition using zfs-fuse for linux (Ubuntu).
I had used it for a while, and then enabled dedup and compression on it (zfs set compression=on/dedup=on). Now I think I have some files that are dedup'ed and compressed, and file that are not yet.
It was OK, but sometimes I was confused. Let's see, following command would consume almost 4GB of my zfs storage:
cp oldfile.4GB newfile.4GB
.. and this would consume almost zero:
cp newfile.4GB newfile.4GB.2
This is because the old file is not yet compressed, so dedup not happened, I think.
My idea is -- if I can find old files that are not yet dedup/compressed, I can perform batch copy/rename/remove them to eliminate duplicity and redundancy. But how I can check that?
I know I can re-copy whole contents of my storage should work (even better with checking the time stamp of each file), but I'd be happier if I have
zfsstat
-like tool that shows some file properties.
EDIT: Verified jlliagre's tip on my environment.
First, made some dataset and directories: $ sudo zfs create zfs/test $ sudo install -d -m 1777 /zfs/test/orig /zfs/test/copy Created some files: $ yes > /zfs/test/orig/yes.1s & sleep 1; kill %1 $ dd if=/dev/zero of=/zfs/test/orig/zero.1M bs=1K count=1024 $ dd if=/dev/urandom of=/zfs/test/orig/rand.1M bs=1K count=1024 Turned compression on, and copy above files: $ sudo zfs set compress=on zfs/test $ cp /zfs/test/orig/* /zfs/test/copy Now the directories look like: $ ls -hil /zfs/test/* /zfs/test/copy: total 1.5K 10 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:30 rand.1M 11 -rw-r--r-- 1 kimura kimura 40M Mar 2 01:30 yes.1s 12 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:30 zero.1M /zfs/test/orig: total 42M 9 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:29 rand.1M 7 -rw-r--r-- 1 kimura kimura 40M Mar 2 01:29 yes.1s 8 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:29 zero.1M And zdb tool shows some information: kimura@kimura-desktop:~$ sudo zdb zfs/test Dataset zfs/test [ZPL], ID 196, cr_txg 108306, 44.2M, 12 objects Object lvl iblk dblk dsize lsize %full type 0 7 16K 16K 16K 16K 37.50 DMU dnode -1 1 16K 512 1K 512 100.00 ZFS user/group used -2 1 16K 512 1K 512 100.00 ZFS user/group used 1 1 16K 512 1K 512 100.00 ZFS master node 2 1 16K 512 1K 512 100.00 ZFS delete queue 3 1 16K 512 1K 512 100.00 ZFS directory 4 1 16K 512 1K 512 100.00 ZFS directory 5 1 16K 512 1K 512 100.00 ZFS directory 6 1 16K 512 1K 512 100.00 ZFS directory 7 3 16K 128K 39.8M 39.8M 100.00 ZFS plain file 8 2 16K 128K 1.00M 1M 100.00 ZFS plain file 9 2 16K 128K 1.00M 1M 100.00 ZFS plain file 10 2 16K 128K 1.00M 1M 100.00 ZFS plain file 11 3 16K 128K 1.41M 39.8M 100.00 ZFS plain file 12 2 16K 128K 0 128K 0.00 ZFS plain file
I can see "yes" and "zero" are well compressed.
-
asari about 13 yearsThank you for your answer! But I don't understand on what you wrote about the 7th column (percentage). What is "the file that really exists"? And one more question to be sure, aren't there any way to see how much blocks are shared in the pool per file?
-
jlliagre about 13 yearsThat is actually, "the percentage of the file that exists", not the file itself. I was referring to sparse files, i.e. files having a part of their content not backed by any on disk data. Analyzing files at the block level to compute how much blocks are shared would probably be doable with the undocumented zdb command but would take a huge amount of time, especially as blocks might be shared by non ZFS file objects.