Is it better to use cat, dd, pv or another procedure to copy a CD/DVD?
Solution 1
All of the following commands are equivalent. They read the bytes of the CD /dev/sr0
and write them to a file called image.iso
.
cat /dev/sr0 >image.iso
cat </dev/sr0 >image.iso
tee </dev/sr0 >image.iso
dd </dev/sr0 >image.iso
dd if=/dev/cdrom of=image.iso
pv </dev/sr0 >image.iso
cp /dev/sr0 image.iso
tail -c +1 /dev/sr0 >image.iso
Why would you use one over the other?
Simplicity. For example, if you already know
cat
orcp
, you don't need to learn yet another command.-
Robustness. This one is a bit of a variant of simplicity. How much risk is there that changing the command is going to change what it does? Let's see a few examples:
- Anything with redirection: you might accidentally put a redirection the wrong way round, or forget it. Since the destination is supposed to be a non-existing file,
set -o noclobber
should ensure that you don't overwrite anything; however you might overwrite a device if you accidentally write>/dev/sda
(for a CD, which is read-only, there's no risk, of course). This speaks in favor ofcat /dev/sr0 >image.iso
(hard to get wrong in a damaging way) over alternatives such astee </dev/sr0 >image.iso
(if you invert the redirections or forget the input one,tee
will write to/dev/sr0
). -
cat
: you might accidentally concatenate two files. That leaves the data easily salvageable. -
dd
:i
ando
are close on the keyboard, and somewhat unusual. There's no equivalent ofnoclobber
,of=
will happily overwrite anything. The redirection syntax is less error-prone. -
cp
: if you accidentally swap the source and the target, the device will be overwritten (again, assuming a non read-only device). Ifcp
is invoked with some options such as-R
or-a
which some people add via an alias, it will copy the device node rather than the device content.
- Anything with redirection: you might accidentally put a redirection the wrong way round, or forget it. Since the destination is supposed to be a non-existing file,
Additional functionality. The one tool here that has useful additional functionality is
pv
, with its powerful reporting options.
But here you can check how much has been copied by looking at the size of the output file anyway.Performance. This is an I/O-bound process; the main influence in performance is the buffer size: the tool reads a chunk from the source, writes the chunk to the destination, repeats. If the chunk is too small, the computer spends its time switching between tasks. If the chunk is too large, the read and write operations can't be parallelized. The optimal chunk size on a PC is typically around a few megabytes but this is obviously very dependent on the OS, on the hardware, and on what else the computer is doing. I made benchmarks for hard disk to hard disk copies a while ago, on Linux, which showed that for copies within the same disk,
dd
with a large buffer size has the advantage, but for cross-disk copies,cat
won over anydd
buffer size.
There are a few reasons why you find dd
mentioned so often. Apart from performance, they aren't particularly good reasons.
- In very old Unix systems, some text processing tools couldn't cope with binary data (they used null-terminated strings internally, so they tended to have problems with null bytes; some tools also assumed that characters used only 7 bits and didn't process 8-bit character sets properly). I'm not sure if this ever was a problem with
cat
(it was with more line-oriented tools such ashead
,sed
, etc.), but people tended to avoid it on binary data because of its association with text processing. This is not a problem on modern systems such as Linux, OSX, *BSD, or anything that's POSIX-compliant. - There's a sort of myth that
dd
is somewhat “lower level” than other tools such ascat
and accesses devices directly. This is completely false:dd
andcat
andtee
and the others all read bytes from their input and write the bytes to their output. The real magic is in/dev/sr0
. -
dd
has an unusual command line syntax, so explaining how it works gives more of an opportunity to shine by explaining something that just writingcat /dev/sr0
. - Using
dd
with a large buffer size can have better performance, but it is not always the case (see some benchmarks on Linux).
A major risk with dd
is that it can silently skip some data. I think dd
is safe as long as skip
or count
are not passed but I'm not sure whether this is the case on all platforms. But it has no advantage except for performance.
So just use pv
if you want its fancy progress report, or cat
if you don't.
Solution 2
Instead of using generic tools like cat
or dd
, one should prefer tools which are more reliable on read errors like
In addition, their default settings are more suitable than e.g. dd
's.
Related videos on Youtube
Admin
Updated on September 18, 2022Comments
-
Admin almost 2 years
Background
I'm copying some data CDs/DVDs to ISO files to use them later without the need of them in the drive.
I'm looking on the Net for procedures and I found a lot:
-
Use of
cat
to copy a medium: http://www.yolinux.com/TUTORIALS/LinuxTutorialCDBurn.htmlcat /dev/sr0 > image.iso
-
Use of
dd
to do so (apparently the most widely used): http://www.linuxjournal.com/content/archiving-cds-iso-commandlinedd if=/dev/cdrom bs=blocksize count=count of=/path/to/isoimage.iso
-
Use of just
pv
to accomplish this: Seeman pv
for more information, although here's an excerpt of it:Taking an image of a disk, skipping errors: pv -EE /dev/sda > disk-image.img Writing an image back to a disk: pv disk-image.img > /dev/sda Zeroing a disk: pv < /dev/zero > /dev/sda
I don't know if all of them should be equivalent, although I tested some of them (using the
md5sum
tool) and, at least,dd
andpv
are not equivalent. Here's themd5sum
of both the drive and generated files using each procedure:md5 of dd procedure:
71b676875b0194495060b38f35237c3c
md5 of pv procedure:
f3524d81fdeeef962b01e1d86e6acc04
EDIT: That output was from another CD than the output given. In fact, I realized there are some interesting facts I provide as an answer.
In fact, the size of each file is different comparing to each other.
So, is there a best procedure to copy a CD/DVD or am I just using the commands incorrectly?
More information about the situation
Here is more information about the test case I'm using to check the procedures I've found so far:
isoinfo -d i /dev/sr0
Output: https://gist.github.com/JBFWP286/7f50f069dc5d1593ba62#file-isoinfo-output-19-aug-2015dd
to copy the media, with output checksums and file information Output: https://gist.github.com/JBFWP286/75decda0a67605590d32#file-dd-output-with-md5-and-sha256-19-aug-2015pv
to copy the media, with output checksums and file information Output: https://gist.github.com/JBFWP286/700a13fe0a2f06ce5e7a#file-pv-output-with-md5-and-sha256-19-aug-2015Any help will be appreciated!
-
Admin almost 9 yearsare the file sizes identical? result of
cmp file1 file2
? did you usedd
with the wrongcount=
(or really any count at all which is not necessary if you want the whole thing?). Read errors in dmesg? -
Admin almost 9 yearsIt goes without saying that files of different sizes are (with 99.9999999999+% probability) are going to have different checksums. As long as you've done the tests, it would be nice if you would post all the results, to include (1) the exact
dd
command that you used (what blocksize? what count?), (2) the sizes and checksums of all outputs, and (3) any independent information you have regarding the amount of data on the source optical disc. … … … … … … P.S. Why are you usingcount=
ondd
? You want to copy the entire disk image, don't you?count=
says "copy this many and then stop". -
Admin almost 9 years@frostschutz In the first case the sizes weren't identical, but surprisingly, I tried again and got different results. See the answer I provided for more details.
-
-
Admin almost 9 yearsThanks so much for your time writting this response! =) Now I understand the differences between them. Just a question: Is
pv < /dev/sr0 > image.iso
the same aspv /dev/sr0 > image.iso
(the latter is found in the manual pages of pv)? -
Gilles 'SO- stop being evil' almost 9 years@JBFWP286 They copy the same thing, but
pv /dev/sr0 …
can include the file name in progress reports whereaspv </dev/sr0
can't. -
Admin almost 9 years@marcelm Thanks for your comment. Just a question: what does "copy the device node" means? Thanks in advance!
-
Gilles 'SO- stop being evil' almost 9 years@JBFWP286 A device node is a file through which you access hardware or other special features provided by kernel drivers. Almost all files in
/dev
are device nodes. For examplecp -R /dev/sr0 image.iso
would makeimage.iso
a file through which the CD drive is accessed, just like/dev/sr0
, instead of a regular file containing copy of the content of the CD which you get withcp /dev/sr0 image.iso
. -
Hashim Aziz over 6 yearsI'm confused. You conclude by saying that
dd
has better performance, but that you should usepv
orcat
anyway. Also, it's worth noting that as of Ubuntu 12.0,dd
comes withstatus=progress
, so there doesn't seem to be any reason to usepv
over it. -
Gilles 'SO- stop being evil' over 6 years@Hashim I don't conclude that it has better performance. I mention that it has better performance sometimes. I've linked to a benchmark I made — in the best case
dd
beatcat
but only by a slight margin. -
RichVel over 6 yearsI would like to see some evidence for "In very old Unix systems, text processing tools such as cat couldn't cope with binary data." The
cat
command (without options) has always simply copied bytes from input to output, and this philosophy that "a file is just a bag of bytes" distinguished Unix from most contemporaneous operating systems. -
RichVel over 6 years@Gilles Thanks for the clarification and update, +1'ed
-
Louis St-Amour over 6 yearsYou can also check status by hitting CTRL+T -- with cp, on OS X, it gives % of copy; with dd it shows the number of bytes copied and transfer rate.
-
jarno about 3 years
pv -pe
works better in showing progress and remaining time, if you give the input file as an argument, instead of redirecting it to standard input like you do in your example. -
jarno about 3 years
dd
can have arguments such asoflag=dsync
,oflag=sync
,conv=fdatasync
andconv=fsync
. They can be useful when writing to a removable device, right? -
Gilles 'SO- stop being evil' about 3 years@jarno Not much.
cat … && sync
is typically faster (no need to sync every block) and just as useful. -
jarno about 3 yearsInfo page tells the
conv=fdatasync
"Synchronize output data just before finishing." So I suppose it does not sync a block at a time.