How to Copy Files Fast

37,106

Solution 1

The fastest version w/o overoptimizing the code I've got with the following code:

class CTError(Exception):
    def __init__(self, errors):
        self.errors = errors

try:
    O_BINARY = os.O_BINARY
except:
    O_BINARY = 0
READ_FLAGS = os.O_RDONLY | O_BINARY
WRITE_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC | O_BINARY
BUFFER_SIZE = 128*1024

def copyfile(src, dst):
    try:
        fin = os.open(src, READ_FLAGS)
        stat = os.fstat(fin)
        fout = os.open(dst, WRITE_FLAGS, stat.st_mode)
        for x in iter(lambda: os.read(fin, BUFFER_SIZE), ""):
            os.write(fout, x)
    finally:
        try: os.close(fin)
        except: pass
        try: os.close(fout)
        except: pass

def copytree(src, dst, symlinks=False, ignore=[]):
    names = os.listdir(src)

    if not os.path.exists(dst):
        os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignore:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                copyfile(srcname, dstname)
            # XXX What about devices, sockets etc.?
        except (IOError, os.error), why:
            errors.append((srcname, dstname, str(why)))
        except CTError, err:
            errors.extend(err.errors)
    if errors:
        raise CTError(errors)

This code runs a little bit slower than native linux "cp -rf".

Comparing to shutil the gain for the local storage to tmfps is around 2x-3x and around than 6x for NFS to local storage.

After profiling I've noticed that shutil.copy does lots of fstat syscals which are pretty heavyweight. If one want to optimize further I would suggest to do a single fstat for src and reuse the values. Honestly I didn't go further as I got almost the same figures as native linux copy tool and optimizing for several hundrends of milliseconds wasn't my goal.

Solution 2

You could simply just use the OS you are doing the copy on, for Windows:

from subprocess import call
call(["xcopy", "c:\\file.txt", "n:\\folder\\", "/K/O/X"])

/K - Copies attributes. Typically, Xcopy resets read-only attributes
/O - Copies file ownership and ACL information.
/X - Copies file audit settings (implies /O).

Solution 3

import sys
import subprocess

def copyWithSubprocess(cmd):        
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

cmd=None
if sys.platform.startswith("darwin"): cmd=['cp', source, dest]
elif sys.platform.startswith("win"): cmd=['xcopy', source, dest, '/K/O/X']

if cmd: copyWithSubprocess(cmd)
Share:
37,106

Related videos on Youtube

alphanumeric
Author by

alphanumeric

Updated on February 28, 2021

Comments

  • alphanumeric
    alphanumeric over 3 years

    It takes at least 3 times longer to copy files with shutil.copyfile() versus to a regular right-click-copy > right-click-paste using Windows File Explorer or Mac's Finder. Is there any faster alternative to shutil.copyfile() in Python? What could be done to speed up a file copying process? (The files destination is on the network drive... if it makes any difference...).

    EDITED LATER:

    Here is what I have ended up with:

    def copyWithSubprocess(cmd):        
        proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    
    win=mac=False
    if sys.platform.startswith("darwin"):mac=True
    elif sys.platform.startswith("win"):win=True
    
    cmd=None
    if mac: cmd=['cp', source, dest]
    elif win: cmd=['xcopy', source, dest, '/K/O/X']
    
    if cmd: copyWithSubprocess(cmd)
    
    • Ecno92
      Ecno92 over 10 years
      You can use the native command line options like cp for Linux & Mac and COPY for Windows. They should be as fast as when you use the GUI.
    • David Heffernan
      David Heffernan over 10 years
      On Windows SHFileOperation gives you the native shell file copy
    • moooeeeep
      moooeeeep over 10 years
      Depending on some factors not stated in the question it could be beneficial to pack the files into a compressed archive before transmission... Have you considered using something like rsync?
    • Michael Burns
      Michael Burns over 10 years
      If you are concerned with ownership and ACL don't use shutil for that reason alone: 'On Windows, file owners, ACLs and alternate data streams are not copied. '
    • alphanumeric
      alphanumeric over 10 years
      If I use the native operation system's commands (such as OSX cp) should I be then using subprocess? Is there any Python module to call cp directly without a need for a subprocess on Mac?
    • Morwenn
      Morwenn almost 6 years
      It's worth noting that in Python 3.8 functions that copy files and directories have been optimized to work faster on several major OS.
  • Sharif Mamun
    Sharif Mamun over 10 years
    I believe you would be a good instructor, nice!
  • alphanumeric
    alphanumeric over 10 years
    Good point. I should be more specific. Instead of right-click-copy and then paste: This schema: 1. Select files; 2. Drag files. 3 Drop files onto a destination folder.
  • Joran Beasley
    Joran Beasley over 10 years
    thats a move then ... which is much different ... try shutil.move instead
  • alphanumeric
    alphanumeric over 10 years
    Would "xcopy" on Windows work with a "regular" subprocess, such as: cmd = ['xcopy', source, dest, "/K/O/X"] subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  • Michael Burns
    Michael Burns over 10 years
    That will work as well.
  • alphanumeric
    alphanumeric over 10 years
    Great! Thanks for the help!
  • alphanumeric
    alphanumeric over 10 years
    In win it could be a move. In osx it could be a copy.
  • searchengine27
    searchengine27 over 7 years
    This solution does not scale. As the files get large this becomes a less usable solution. You would need to make multiple system calls to the OS to still read portions of the file into memory as files get large.
  • Joran Beasley
    Joran Beasley over 7 years
    this was a question about timings ... and it was basically pointing out his operations were not equivelent
  • muppetjones
    muppetjones over 7 years
    Not sure if this is specific to later versions of python (3.5+), but the sentinel in iter needs to be b'' in order to stop. (At least on OSX)
  • Robert Kelly
    Robert Kelly about 7 years
    I find it hard to believe that if you CTRL + C a 100 gigabyte file in Windows it attempts to load that in memory right then and there...
  • Joran Beasley
    Joran Beasley about 7 years
    theres more than a grain of truth to that im sure ... perhaps i oversimplified it
  • Admin
    Admin almost 7 years
    Invalid number of parameters error
  • Vasily
    Vasily over 6 years
    doesn't work with python 3.6 even after fixing 'except' syntax
  • Dmytro
    Dmytro over 6 years
    This is a python2 version. I haven't tested this with python3. Probably python3 native filetree copy is fast enough, one shall do a benchmark.
  • Soumyajit
    Soumyajit over 6 years
    This is way faster than anything else I tried.
  • Spencer
    Spencer almost 6 years
    FYI I would suggest adding shutil.copystat(src, dst) after the file is written out to bring over metadata.
  • Spencer
    Spencer over 5 years
    @Dmytro Just tested in py3.6, still 3x faster than shutil.copy2()!
  • Varlor
    Varlor over 5 years
    @Spencer What did you changed in the code to make it work with python 3.5+ ?, Best regards!
  • Spencer
    Spencer over 5 years
    @Varlor From what I remember I just changed this one line to the following: for x in iter(lambda: os.read(fin, BUFFER_SIZE), b""): (made the string into bytes by adding 'b')
  • Varlor
    Varlor over 5 years
    ah, because I get also a syntax error with the 'why' in the except line 'except (IOError, os.error), why:'
  • Vahid Pazirandeh
    Vahid Pazirandeh over 5 years
    To boost performance also see the pyfastcopy module which uses the system call sendfile(). This module works for Python 2 and 3. All you have to do is literally "import pyfastcopy" and shutils will automatically perform better. Like @Morwenn mentioned above, Python 3.8 will have sendfile() built into its impl.
  • Andry
    Andry over 4 years
    I've tried to replace shutil.copyfile by the copyfile from the example under python 3.8.0, but it seems hanged in the for x in iter loop.
  • Andry
    Andry over 4 years
    @VahidPazirandeh pyfastcopy does not work on Windows: The sendfile system call does not exist on Windows, so importing this module will have no effect.
  • Gustavo Gonçalves
    Gustavo Gonçalves about 4 years
    Economic on explanations, but this is a great answer.
  • Rexovas
    Rexovas over 3 years
    Note that the /O and /X flags require an elevated subprocess, else you will result in "Access denied"
  • Spencer
    Spencer about 2 years
    For those trying to get the best performance, buffer size is VERY important zabkat.com/blog/buffered-disk-access.htm
  • Spencer
    Spencer about 2 years
    This is a very fast option for single-file copies, but for anyone out there trying to thread large numbers of files it can run far slower (9x in a recent test of 4000 files). I was able to get better results modifying copy2 buffer size as in other answer.