Why does a gunzip to dd pipeline slow down at the end?

5,609

I figure there is a cache somewhere that is holding up the final 5MB but I thought fsync should make sure that doesn't happen

conv=fsync means to write back any caches by calling fsync - after dd has written all the data. Hanging at the end is exactly what it will do.

When the output file is slower than the input file, the data written by dd can pile up in caches. The kernel cache can sometimes fill a significant fraction of system RAM. This makes for very misleading progress information. Your "final 5MB" was just an artefact of how dd shows progress.

If your system was indeed caching about 8GB (i.e. half of the 16GB of written data), then I think you either must have about 32GB of RAM, or have been fiddling with certain kernel options. See the lwn.net link below. I agree that not getting any progress information for 15 minutes is pretty frustrating.

There are alternative dd commands you could use. If you want dd to show more accurate progress, you might have to accept more complexity. I expect the following would work without degrading your performance, though maybe reality has other ideas than I do.

gunzip -c serial2udp.image.gz |
dd iflag=fullblock bs=4M |
sudo dd iflag=fullblock oflag=direct conv=fsync status=progress bs=4M of=/dev/mmcblk0
  • oflag=direct iflag=fullblock avoids piling up kernel cache, because it bypasses it altogether.
  • iflag=fullblock is required in such a command AFAIK (e.g. because you are reading from a pipe and writing using direct IO). The effect of missing fullblock is another unfortunate complexity of dd. Some posts on this site use this to argue you should always prefer to use a different command. It's hard to find another way to do direct or sync IO though.
  • conv=fsync should still be used, to write back the device cache.
  • I added an extra dd after gunzip, to buffer the decompressed output in parallel with the disk write. This is one of the issues that makes the performance with oflag=direct or oflag=sync a bit complex. Normal IO (non-direct, non-sync) is not supposed to need this, as it is already buffered by the kernel cache. You also might not need the extra buffer if you were writing to a hard drive with 4M of writeback cache, but I don't assume an SD card has that much.

You could alternatively use oflag=direct,sync (and not need conv=fsync). This might be useful for good progress information if you had a weird output device with hundreds of megabytes of cache. But normally I think of oflag=sync as a potential barrier to performance.

There is a 2013 article https://lwn.net/Articles/572911/ which mentions minute-long delays like yours. Many people see this ability to cache minutes worth of writeback data as undesirable. The problem was that the limit on the cache size was applied indiscriminately, to both fast and slow devices. Note that it is non-trivial for the kernel to measure device speed, because it varies depending on the data locations. E.g. if the cached writes are scattered in random locations, a hard drive will take longer from repeatedly moving the write head.

why do the updates hang

The fsync() is a single system call that applies to the entire range of the file device. It does not return any status updates before it is done.

Share:
5,609
cts
Author by

cts

Updated on September 18, 2022

Comments

  • cts
    cts over 1 year

    my command:

    gunzip -c serial2udp.image.gz |
    sudo dd of=/dev/mmcblk0 conv=fsync,notrunc status=progress bs=4M
    

    my output:

    15930949632 bytes (16 GB, 15 GiB) copied, 1049 s, 15.2 MB/s    
    
    0+331128 records in
    0+331128 records out
    15931539456 bytes (16 GB, 15 GiB) copied, 1995.2 s, 8.0 MB/s
    

    the card: SanDisk Ultra 32GB MicroSDHC Class 10 UHS Memory Card Speed Up To 30MB/s
    distribution: 16.0.4 xenial with xfce
    kernel version: 4.13.0.37-generic

    i understand taking 17 minutes seems reasonable from what I've read. playing with block size doesn't really seem to make much of a difference (bs=100M still exhibits this behaviour with similar timestamps). why do the updates hang and it doesn't produce a finished report for another 16 minutes??

    iotop tells me that mmcqd/0 is still running in the background at this point (at 99% IO), so I figure there is a cache somewhere that is holding up the final 5MB but I thought fsync should make sure that doesn't happen iotop shows no traffic crossing at this time either for dd. ctrl-c is all but useless and i don't want to corrupt my drive after writing to it.

    • cts
      cts over 5 years
      updated 4.13.0.37-generic kernel. thanks
  • user2948306
    user2948306 over 5 years
    Very impressive if fsync() is not doing anything on the block device and the other issue is causing a 16 minute delay after writing 99.99% of data in 16 minutes. Anyway, if the answer doesn't assert fsync is a no-op then I have no downvote-worthy issue. I found a claim that device writes are synchronous on BSD, maybe there is a difference between OS's.
  • cts
    cts over 5 years
    just as a follow up, this code suggested by @sourcejedi worked as in it gave me a much more accurate representation of the progress and the system did not hang at the end. it did take longer to process though: 3798+1 records in 3798+1 records out 15931539456 bytes (16 GB, 15 GiB) copied, 2838.48 s, 5.6 MB/s identical file size and block size used. so this seems to be a trade off of realtime progress versus speed. that being said, I had tried this previously on 4 different attempts and got corrupted writes for the raspi.img, and this time it worked on the first write.
  • user2948306
    user2948306 over 5 years
    @cts Thank you for feedback! I can't think of any explanation why it would be that much slower for your device. So I don't have any other specific suggestion. In principle direct could want a bigger bs=, but on my system cat /sys/class/block/mmcblk0/queue/max_hw_sectors_kb already suggests it is capped at 512 KB. I have a 16GB Ultra bought in 2017, so I've tested it with 500MB of data. I got ~13.1 MB/s overall with your original command, and 13.5 MB/s with mine.
  • G-Man Says 'Reinstate Monica'
    G-Man Says 'Reinstate Monica' over 5 years
    Why do you run pv under sudo?  The OP runs dd under sudo because dd is opening /dev/mmcblk0 (with of=), but, in your command, the shell is opening /dev/mmcblk0.
  • agc
    agc over 5 years
    @G-Man, Thanks, that wasn't going to work... See revised answer for fix.