Which is faster, and why: transferring several small files or few large files?

40,690

Solution 1

It is faster to transfer a single large file instead of lots of little files because of the overhead of negotiating the transfer. The negotiation is done for each file, so transferring a single file it needs to be done once, transferring n files means it needs to be done n times.

You will save yourself a lot of time if you zip first before the transfer.

Solution 2

Jon Cahill is very correct, a single file will be faster. However it's worth keeping in mind that if there is any instability in the connection, individual files (or medium-sized groups in zip files) may be better, because if the transfer fails you'll have to start all over again, whereas with multiple files you will just have to re-do the last file started

Solution 3

Lots of little files will also be more expensive to write to the file system than a single large file. It needs to do things like:

  • Check the file name is unique
  • Write out the file table entry

As you get more and more files in a directory this can become quite costly. And each of these steps can add latency to the copy process and slow the whole thing down.

Share:
40,690

Related videos on Youtube

kestes
Author by

kestes

Updated on September 17, 2022

Comments

  • kestes
    kestes over 1 year

    I soon will have a folder with thousands of files, each file on the order of a few KB. I will need to transfer these across a Windows network from one UNC share to another. In general, is it faster to simply copy the files over en masse, or would it be faster to zip them up (e.g., using 7zip in fastest mode) and send one or a few large files? Or is there no difference in practice?

  • Matt Bettiol
    Matt Bettiol about 15 years
    en.wikipedia.org/wiki/Slow-start also favours large files.
  • BlaM
    BlaM about 15 years
    I guess he's still going to need all the small files in the target system, so he'll probably have to extract the zip later on, i.e. the filesystem will still have to do the work. Sending the large file and unzipping will still be much faster than transferring all the small files over net, though.
  • Daniel Schneller
    Daniel Schneller about 15 years
    Consider that compression will take time, too. If your data cannot be compressed (e. g. JPEGs, ZIPs, JARs and other already compressed formats) you should only TAR them (or ZIP without compression). This will save CPU time for the pointless attempt to further compress your data.
  • Unkwntech
    Unkwntech about 15 years
    Unless the transfer protocol has resume.
  • user2278
    user2278 about 15 years
    That many small files will cause you a lot of pain - in between tiny packets and doing an SMB handshake for each one, zipping will probably shave a good 60% off your copy time.
  • Naveen
    Naveen about 15 years
    @BlaM, as I said in the answer it all comes down to latency. If network latency is added onto each CreateFile operation the total time could be much longer. If the copy is smart enough to concurrently create files perhaps it wouldn't impact the operation.
  • Cristian Vat
    Cristian Vat almost 15 years
    +1 for TAR since you can copy/extract partial archive.
  • tbone
    tbone about 12 years
    This answer is correct, but on Windows 7 (at least) there is a known bug where copying the exact same set of files on XP is much faster than on Windows 7: social.technet.microsoft.com/Forums/en-US/w7itproperf/thread‌​/…
  • nealmcb
    nealmcb almost 12 years
    Note that if the files are really small, a pk zip archive might be bigger than the raw files, since zip stores two copies of file metadata per file, which can add up to between perhaps 80 and 140 overhead bytes per file depending on what "extra" filestamps, uids and other metadata are included. So another archive format might be be slightly more efficient. But overall, the networking overheads are probably the biggest issues, so any archive will help.