Multithreaded cp on linux?
Solution 1
As Celada mentioned, there would be no point to using multiple threads of execution since a copy operation doesn't really use the cpu. As ryekayo mentioned, you can run multiple instances of cp
so that you end up with multiple concurrent IO streams, but even this is typically counter-productive. If you are copying files from one location to another on the same disk, trying to do more than one at a time will result in the disk wasting time seeking back and forth between each file, which will slow things down. The only time it is really beneficial to copy multiple files at once is if you are, for instance, copying several files from several different slow, removable disks onto your fast hard disk, or vice versa.
Solution 2
Well, I believe you could use gnu parallel to accomplish your task.
seq 70 | parallel -j70 cp filename
You could see a detailed explanation on using gnu parallel from my other answer here.
I just tested the above command in my system and I could see that 70 copies of files are being made.
Related videos on Youtube
leeand00
Projects jobdb - Creator of Open Source Job Search Document Creator/Tracker http://i9.photobucket.com/albums/a58/Maskkkk/c64nMe.jpg Received my first computer (see above) at the age of 3, wrote my first program at the age of 7. Been hooked on programming ever since.
Updated on September 18, 2022Comments
-
leeand00 almost 2 years
Is there a multi-threaded
cp
command on Linux?I know how to do this on Windows, but I don't know how this is approached in a Linux environment.
-
Celada over 9 yearsSinec
cp
is IO-bound, I'm not sure how much multithreading would help. -
Jon Bringhurst over 9 yearsDo you have a filesystem with multiple read-write heads? If you do, take a look at github.com/hpc/dcp
-
maxschlepzig over 9 yearsI don't see how a question about
cp
could be a duplicate of a question aboutdd
... -
Ciro Santilli Путлер Капут 六四事 almost 9 years
-
-
leeand00 over 9 yearsI want to be able to specify the number of parallel threads used.
-
Matej Vrzala M4 over 9 yearsYou can probably stick this in a function and take in a parameter to accept ints to specify the number of times you want the command run..That'll be a bit of coding on your end though
-
Matej Vrzala M4 over 9 yearsEven easier way than what I got.
-
Thorbjørn Ravn Andersen over 9 yearsDisk seeks are only relevant for non-SSD disks.
-
psusi over 9 years@ThorbjørnRavnAndersen, the severe penalty for seeks on HDD is almost none on SSD, yet the point remains that there is no benefit to trying to read or write to/from multiple parts of a single disk at the same time.
-
Thorbjørn Ravn Andersen over 9 years@MatteoItalia if the IO-channel is saturated there is nothing caches can improve.
-
Thorbjørn Ravn Andersen over 9 years@psusi The argument was that the disk wasted time seeking, not that the disk could not serve data any faster.
-
Paul Draper over 7 yearsNot all setups are one SSD-disk. For example, right now I am waiting on a 1 hour copy with AWS EFS, which uses multiple disks and has high latency.
-
peterh over 7 yearsNot true, the reading from the source and the writing to the destination should be done in parallel, but it isn't so. Yes, the gnu fileutils requires a few tuning. There are other, not so common cases as well, as parallel copy would be profitable, for example on network drives or on raid/lvm.
-
psusi over 7 years@peterh, it isn't good on raid either for the same reason it isn't on a single drive: you're just making multiple drives seek their heads back and forth. Network drives are not going to benefit either unless your drive and network connection are both faster than at least one of the server drives and this isn't likely to be the case.
-
peterh over 7 years@psusi There are many raid personalities, for example in linearly ordered raid devices it is not an issue, furthermore it is also not an issue if the raid has a bigger block structure as the blocks of the worker threads of the cp tools. Furthermore, the actual disk access block order is controlled not by the cp, but by the disk layer (ok, our most pathologic ext4 writes out the write cache in every 5 seconds by default setting...). In the case of the reading is there the trick of the readahead, although it is far not so effective.
-
peterh over 7 years@psusi But, the focus of my comment was this: the current copy tool 1. reads a block from the input, 2. THEN writes this block to the output. While it reads, the write operation stalls, and while it writes, the read operation stalls. At least its reading and writing should be done on two different threads, this is my point.
-
psusi over 7 years@peterh, cp can do one byte at a time and it doesn't matter ( much ). Read ahead and write behind make sure that the individual read and write calls do not stall, at least until there is plenty of data in the write behind cache to keep the disk(s) busy, at which point the kernel starts letting the write calls block for a bit to avoid filling all of ram with dirty pages.
-
John over 5 yearson amazon you can get up to 10 times the sequential read speed when using multi threaded access to the same seqeuential file. So the answer isn'T really accurate anymore
-
Paul Knopf over 5 yearsI develop recording software that runs on embedded Linux devices. I need to archive my internal media to external thumb drives. I used to use .NET, and having read/write threads increased performance by around %40. I'd like a similar approach, but with cp/native.
-
joker over 4 yearsRunning a command in the background has nothing to do, at all, with multithreading.
-
Szczepan Hołyszewski over 2 yearsAnd that copies
filename
70 times? -
Jason Newton over 2 yearsThis answer is pretty misinformative; there are many cases where parallel copy have different aggregate performance / timings - often drastically so (easily 10x throughput) - nas/raids are very common, as are pcie based memory devices are just a few environments I've observed this. Sometimes for one reason or another this is true of any tech with sockets in the loop as well. It also does not suggest an original solution to the problem; just says try what others suggest even though it'd be counterproductive.
-
psusi over 2 yearsNormal raids won't benefit from it either. I suppose if you are using JBOD/linear mode and get lucky and happen to have some of the files on one underlying disk, and some files on another, then you might see some improvement, but that's unusual and unlikely.