Why specify block size when copying devices of a finite size?

linux hard-disk dd cloning

34,756

Solution 1

When is dd suitable for copying data? (or, when are read() and write() partial) points out an important caveat when using count: dd can copy partial blocks, so when given count it will stop after the given number of blocks, even if some of the blocks were incomplete. You may therefore end up with fewer than bs * count bytes copied, unless you specify iflag=fullblock.

The default block size for dd is 512 bytes. count is a limit; as your question hints it isn't required when copying a device of finite size, and is really intended to copy only part of a device.

I think there are two aspects to consider here: performance and data recovery.

As far as performance is concerned, you ideally want the block size to be at least equal to, and a multiple of, the underlying physical block size (hence 2048 bytes when reading a CD-ROM). In fact nowadays you may as well specify larger block sizes to give the underlying caching systems a chance to buffer things for you. But increasing the block size means dd has to use that much more memory, and it could be counter-productive if you're copying over a network because of packet fragmentation.

As far as data recovery is concerned, you may retrieve more data from a failing hard disk if you use smaller block sizes; this is what programs such as dd-rescue do automatically: they read large blocks initially, but if a block fails they re-read it with smaller block sizes. dd won't do this, it will just fail the whole block.

Solution 2

There's a bit of a cargo cult around dd. Originally, there were two bugs in cp that caused problems: It would misdetect files as sparse when reported with a block size other than 512 (Linux used a block size of 1024), and it did not clear empty blocks from the destination when copying from a sparse file to a block device.

You can find some references to this in the early Linux mailing list archives.

So people got used to dd being the correct way to deal with disk images, and cp fell by the wayside. And since dd uses a default block size of 512, it's slow (slower than cp on modern systems). But it's not obvious what block size you should use. Probably in your case someone has read that 2048 is the "natural" block size for a CD-ROM (it is, CD-ROMs are divided into 2,352 byte sectors containing 2,048 bytes of data along with error correcting information) and has decided that this is the "right" size to use with dd, when in fact you would probably get faster results if you used a (moderately) larger block size. In fact, GNU cp uses a default block size of 64k for this reason.

tl;dr: cp /dev/dvd foobar.iso should work fine. The default block size for dd is 512. The only effect leaving it alone is likely to have in most modern circumstances is to make the copying process slower.

Solution 3

Changing the block size is a good way to change how much gets buffered or is read/written at a time.

Doesn't really relate to whether it's a real block device or an infinite/virtual one. It's about how much you want stored in memory before dd goes to write it out. bs= sets both ibs= (how much data is read in at a time) and obs= (how much data is written out at a time). The higher the obs= the more iterations of ibs= will be required before you have enough data for dd to start writing to the destination.

count= is also not dependent on anything other than what you're wanting to do. It controls how many "blocks" (as measured by ibs=) will be required for dd to consider its job as being done.

Solution 4

Using the blocksize option on dd effectively specifies how much data will get copied to memory from the input I/O sub-system before attempting to write back to the output I/O sub-system. The output is the same (as the whole disk is being copied), the chunks are just being read at the different size you specify (most dd implementations go with a default blocksize of 512 bytes).

If you have large amounts of spare memory and increase the blocksize, then more larger chunks of data can be read in succession, buffered and flushed to the output destination. A lower block size requires more overhead in terms of each individual lseek, memset etc.

Your mileage may vary depending on where your if= and of= are set, and what hardware you are going through, if you have low memory and so forth.

Solution 5

The bs= represents the block size to read or to write. Leaving the field intact or not specifying it may seem to do the same job of copying but there is hidden fact in using it. For example,

Having 1000000000000000 files with each of only 1~10 kb.
Having a single file for 10 gb

In the first case using lower block size has been found to increase copying speed. While in the latter, Higher block size has been a better option since it increases the sector size leaving less number of sector change command, which usually results in faster I/O operations.

View more solutions

34,756

dotancohen

Updated on September 18, 2022

Comments

dotancohen over 1 year
In online tutorials it is often suggested to use the following command to copy a CDROM to an iso image:
```
$ dd if=/dev/dvd of=foobar.iso bs=2048
```
Why must the byte size be specified? I notice that in fact 2048 is the standard byte size for CDROM images but it seems that dd without specifying bs= or count= works as well.

Under what circumstances would it be problematic to not specify bs= or count= when copying from a device of finite size?
Drav Sloan about 9 years

Note Stephens point of dd copying partial blocks - it's not always bs * count.
wurtel about 9 years

Note that on some unix systems you must read a multiple of the native block size; dd without bs=2048 or some multiple thereof would give an error when reading from a block device cdrom drive.
Jason C about 9 years

Performance especially; write a partition image to an SD card, for example, using dd bs=4m iflag=fullblock vs dd bs=1111 and notice the substantially higher data rates the former will give you. This is because the former aligns with natural block sizes on the SD card, while the latter requires the SD controller to do much reading, copying and reflashing to write partial physical blocks. The importance of fullblock should not be underestimated, by the way, as without it, bs is only a maximum and partial reads could lead to persistent subsequent misalignments.
apurkrt over 4 years

it might have changed, anyhow GNU cp uses 128k block size by default (not 64k), see eklitzke.org/efficient-file-copying-on-linux