How can I tell if ZFS (zfs-fuse) dedup/compression is applied to a particular file?

linux ubuntu filesystems zfs

5,843

You can get deduplication overall statistics with the zdb -D poolname command.

For per file compression status, it's not very straightforward but you might use this:

zdb dataset | grep plain

This will output lines looking like these ones:

     8    2    16K   128K  3.03M  5.00M  100.00  ZFS plain file
     9    2    16K   128K  3.03M  5.00M  100.00  ZFS plain file
    10    2    16K   128K  5.00M  5.00M  100.00  ZFS plain file
    11    2    16K   128K  3.03M  6.00M   83.33  ZFS plain file

The first column is the inode number, column 5 and 6 represent the size on disk and the file size, and column 7 the percentage of the file that really exists. Any file with different values in 6 and 7 and 100% as 8 are compressed.

5,843

asari

Updated on September 17, 2022

Comments

asari over 1 year

I have a zfs formatted partition using zfs-fuse for linux (Ubuntu).

I had used it for a while, and then enabled dedup and compression on it (zfs set compression=on/dedup=on). Now I think I have some files that are dedup'ed and compressed, and file that are not yet.

It was OK, but sometimes I was confused. Let's see, following command would consume almost 4GB of my zfs storage:

cp oldfile.4GB newfile.4GB

.. and this would consume almost zero:

cp newfile.4GB newfile.4GB.2

This is because the old file is not yet compressed, so dedup not happened, I think.

My idea is -- if I can find old files that are not yet dedup/compressed, I can perform batch copy/rename/remove them to eliminate duplicity and redundancy. But how I can check that?

I know I can re-copy whole contents of my storage should work (even better with checking the time stamp of each file), but I'd be happier if I have zfsstat-like tool that shows some file properties.

EDIT: Verified jlliagre's tip on my environment.

First, made some dataset and directories:
$ sudo zfs create zfs/test
$ sudo install -d -m 1777 /zfs/test/orig /zfs/test/copy

Created some files:
$ yes > /zfs/test/orig/yes.1s & sleep 1; kill %1
$ dd if=/dev/zero of=/zfs/test/orig/zero.1M bs=1K count=1024
$ dd if=/dev/urandom of=/zfs/test/orig/rand.1M bs=1K count=1024

Turned compression on, and copy above files:
$ sudo zfs set compress=on  zfs/test
$ cp /zfs/test/orig/* /zfs/test/copy

Now the directories look like:
$ ls -hil /zfs/test/*
/zfs/test/copy:
total 1.5K
10 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:30 rand.1M
11 -rw-r--r-- 1 kimura kimura  40M Mar  2 01:30 yes.1s
12 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:30 zero.1M

/zfs/test/orig:
total 42M
9 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:29 rand.1M
7 -rw-r--r-- 1 kimura kimura  40M Mar  2 01:29 yes.1s
8 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:29 zero.1M

And zdb tool shows some information:
kimura@kimura-desktop:~$ sudo zdb zfs/test 
Dataset zfs/test [ZPL], ID 196, cr_txg 108306, 44.2M, 12 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K    16K    16K   37.50  DMU dnode
        -1    1    16K    512     1K    512  100.00  ZFS user/group used
        -2    1    16K    512     1K    512  100.00  ZFS user/group used
         1    1    16K    512     1K    512  100.00  ZFS master node
         2    1    16K    512     1K    512  100.00  ZFS delete queue
         3    1    16K    512     1K    512  100.00  ZFS directory
         4    1    16K    512     1K    512  100.00  ZFS directory
         5    1    16K    512     1K    512  100.00  ZFS directory
         6    1    16K    512     1K    512  100.00  ZFS directory
         7    3    16K   128K  39.8M  39.8M  100.00  ZFS plain file
         8    2    16K   128K  1.00M     1M  100.00  ZFS plain file
         9    2    16K   128K  1.00M     1M  100.00  ZFS plain file
        10    2    16K   128K  1.00M     1M  100.00  ZFS plain file
        11    3    16K   128K  1.41M  39.8M  100.00  ZFS plain file
        12    2    16K   128K      0   128K    0.00  ZFS plain file

I can see "yes" and "zero" are well compressed.

asari about 13 years

Thank you for your answer! But I don't understand on what you wrote about the 7th column (percentage). What is "the file that really exists"? And one more question to be sure, aren't there any way to see how much blocks are shared in the pool per file?
jlliagre about 13 years

That is actually, "the percentage of the file that exists", not the file itself. I was referring to sparse files, i.e. files having a part of their content not backed by any on disk data. Analyzing files at the block level to compute how much blocks are shared would probably be doable with the undocumented zdb command but would take a huge amount of time, especially as blocks might be shared by non ZFS file objects.