Which is faster, and why: transferring several small files or few large files?
Solution 1
It is faster to transfer a single large file instead of lots of little files because of the overhead of negotiating the transfer. The negotiation is done for each file, so transferring a single file it needs to be done once, transferring n files means it needs to be done n times.
You will save yourself a lot of time if you zip first before the transfer.
Solution 2
Jon Cahill is very correct, a single file will be faster. However it's worth keeping in mind that if there is any instability in the connection, individual files (or medium-sized groups in zip files) may be better, because if the transfer fails you'll have to start all over again, whereas with multiple files you will just have to re-do the last file started
Solution 3
Lots of little files will also be more expensive to write to the file system than a single large file. It needs to do things like:
- Check the file name is unique
- Write out the file table entry
As you get more and more files in a directory this can become quite costly. And each of these steps can add latency to the copy process and slow the whole thing down.
Related videos on Youtube
kestes
Updated on September 17, 2022Comments
-
kestes over 1 year
I soon will have a folder with thousands of files, each file on the order of a few KB. I will need to transfer these across a Windows network from one UNC share to another. In general, is it faster to simply copy the files over en masse, or would it be faster to zip them up (e.g., using 7zip in fastest mode) and send one or a few large files? Or is there no difference in practice?
-
Matt Bettiol about 15 yearsen.wikipedia.org/wiki/Slow-start also favours large files.
-
BlaM about 15 yearsI guess he's still going to need all the small files in the target system, so he'll probably have to extract the zip later on, i.e. the filesystem will still have to do the work. Sending the large file and unzipping will still be much faster than transferring all the small files over net, though.
-
Daniel Schneller about 15 yearsConsider that compression will take time, too. If your data cannot be compressed (e. g. JPEGs, ZIPs, JARs and other already compressed formats) you should only TAR them (or ZIP without compression). This will save CPU time for the pointless attempt to further compress your data.
-
Unkwntech about 15 yearsUnless the transfer protocol has resume.
-
user2278 about 15 yearsThat many small files will cause you a lot of pain - in between tiny packets and doing an SMB handshake for each one, zipping will probably shave a good 60% off your copy time.
-
Naveen about 15 years@BlaM, as I said in the answer it all comes down to latency. If network latency is added onto each CreateFile operation the total time could be much longer. If the copy is smart enough to concurrently create files perhaps it wouldn't impact the operation.
-
Cristian Vat almost 15 years+1 for TAR since you can copy/extract partial archive.
-
tbone about 12 yearsThis answer is correct, but on Windows 7 (at least) there is a known bug where copying the exact same set of files on XP is much faster than on Windows 7: social.technet.microsoft.com/Forums/en-US/w7itproperf/thread/…
-
nealmcb almost 12 yearsNote that if the files are really small, a pk zip archive might be bigger than the raw files, since zip stores two copies of file metadata per file, which can add up to between perhaps 80 and 140 overhead bytes per file depending on what "extra" filestamps, uids and other metadata are included. So another archive format might be be slightly more efficient. But overall, the networking overheads are probably the biggest issues, so any archive will help.