Error Handling: Boto: [Error 104] Connection Reset by Peer

14,189

Solution 1

I had exactly the same problem. If you search boto on GitHub, you will see, we are not alone.

There's also a known accepted issue: https://github.com/boto/boto/issues/2207

Reaching performance limits of AWS S3

The truth is, that we got so used to boto and AWS S3 service, we have forgotten, these are really distributed systems, which might break in some cases.

I was archiving (download, tar, upload) huge number of files (about 3 years with around 15 feeds each having about 1440 versions a day) and using Celery to do this faster. And I have to say, that I was sometime getting these errors more often, probably reaching performance limits of AWS S3. These errors were often appearing in chunks (in my case I was uploading about 60 Mbps for couple of hours).

Training S3 performance

When I was measuring performance, it was "trained". After some hour, the responsiveness of S3 bucket jumped up, AWS have probably detected higher load and spin up some more instances serving it.

Try latest stable version of boto

Other thing is, that boto is trying to retry in many cases, so many failures are hidden to our calls. Sometime I got a bit better with upgrading to the latest stable version.

My conclusion are:

  • try upgrading to the latest stable boto
  • when error rate grows up, lower the pressure
  • accept the fact, that AWS S3 is distributed service having rare performance problems

In your code, I would definitely recommend adding some sleep, (at least 5, but 30 s would seem fine to me), otherwise you are just pushing harder and harder to a system, which might be in shaky situation at the moment.

Solution 2

Well, it appeared the time.sleep() worked for a while. But, now that the files are bigger, that doesn't even do the trick. It seems like I need to restart the loop to get it working again. This modification seems to be working.

def download(filesToDownload):
    temp = []
    for sFile in filesToDownload:
        for keys in bucket.list(prefix='<bucket>%s' % (sFile)):
            while True:
                try:
                    keys.get_contents_to_filename('%s%s' % (downloadRoot,sFile))
                    temp.append(sFile)
                except:
                    time.sleep(30)
                    x = set(filesToDownload) - set(temp)
                    download(x)
                break
Share:
14,189
cloud36
Author by

cloud36

Updated on June 17, 2022

Comments

  • cloud36
    cloud36 almost 2 years

    I have a script that downloads from Amazon S3. The scripts works 99.9% of the time. Occasionally I get the following error (socket.error: [Errno 104] Connection reset by peer). Once I restart the code the error seems to go away. Since its hard to recreate the error. I'm hoping the snipped of code below will fix the error. Specifically, I'm hoping if the error comes up, it'll try to re-download the file. I'm wondering if this code will work, and if there is anything else I should add in. I'm thinking an error counter might be good, so if the error does keep coming up it'll eventually move on. (Not exactly sure how to add a counter)

    files = [#list of files to download]
    
    for file in files:
        for keys in bucket.list(prefix=file):
            while True:
                try:
                    keys.get_contents_to_filename()
                except socket.error:
                    continue
                break
    
  • Nirojan Selvanathan
    Nirojan Selvanathan over 5 years
    Same issue was occurring for me, I installed awscli via the apt-get. But after removing and installing the newer version via pip resolved the issue.