PyMySQL throws 'BrokenPipeError' after making frequent reads

11,133

Solution 1

Mysql Server Has Gone Away

This problem is described extensively at http://dev.mysql.com/doc/refman/5.7/en/gone-away.html the usual cause is that the server has disconnected for whatever reason and the usual remedy is to retry the query or to reconnect and retry.

But why this breaks your code is because of the way you have written your code. See below

Possibly because I have written my code poorly?

Since you asked.

rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))

This is a bad habit. The manually explicitly warns you against doing this to avoid SQL injections. The correct way is

 rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = %s", (file_["name"],))

The second problem with the above code is that you don't need to check if a value exists before you try to update it. You can delete the above line and it's associated if else and jump straight to the update. Besides, our elif and else seem to do exactly the same thing. So your code can just be

try:
        cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
        cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
        connection.commit()
except:  # <-- next problem.

And we come to the next problem. Never ever catch generic exceptions like this. you should always catch specific exceptions like TypeError, AttributeError etc. When catching generic exceptions is unavoidable, you should at least log it.

For example, here you could catch connection errors and attempt to reconnect to the database. Then the code will not stop executing when your server gone away problem happens.

Solution 2

I've solved the same error in the case when I tried to make a bulk inserts by reducing the number of lines I wanted to insert in one command.

Even the maximum number of lines for bulk insert was much higher, I had this kind of error.

Share:
11,133
TheNationalSarcasmSociety
Author by

TheNationalSarcasmSociety

Updated on July 26, 2022

Comments

  • TheNationalSarcasmSociety
    TheNationalSarcasmSociety over 1 year

    I have written a script to help me work with a database. Specifically, I am trying to work with files on disk and add the result of this work to my database. I have copied the code below, but removed most of the logic which isn't related to my database to try to keep this question broad and helpful.

    I used the code to operate on the files and add the result to the database, overwriting any files with the same identifier as the one I was working on. Later, I modified the script to ignore documents which have already been added to the database, and now whenever I run it I get an error:

    pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")
    

    It seems like the server is rejecting the requests, possibly because I have written my code poorly? I have noticed that the error always occurs at the same place in the list of files, which doesn't change. If I re-run run the code, replacing the file list with a list of only the file on which the program crashes, it works fine. This makes me think that after making a certain number of requests, the database just bottoms out.

    I'm using Python 3 and MySQL Community Edition Version 14.14 on OS X.

    Code (stripped of stuff that doesn't have to do with the database):

    import pymysql
    
    # Stars for user-specific stuff
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='*******',
                                 db='*******',
                                 use_unicode=True, 
                                 charset="utf8mb4",
                                 )
    cursor = connection.cursor()
    
    f_arr = # An array of all of my data objects
    
    def convertF(file_):
        # General layout: Try to work with input and add it the result to DB. The work can raise an exception
        # If the record already exists in the DB, ignore it
        # Elif the work was already done and the result is on disk, put it on the database
        # Else do the work and put it on the database - this can raise exceptions
        # Except: Try another way to do the work, and put the result in the database. This can raise an error
        # Second (nested) except: Add the record to the database with indicator that the work failed
    
        # This worked before I added the initial check on whether or not the record already exists in the database. Now, for some reason, I get the error:
        # pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")
    
        # I'm pretty sure that I have written code to work poorly with the database. I had hoped to finish this task quickly instead of efficiently.
        try:
            # Find record in DB, if text exists just ignore the record
            rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))
            length = cursor.fetchall()[0][0] # Gets the length
            if length != None and length > 4:
                pass
            elif ( "work already finished on disk" ): 
                # get "result_text" from disk
                cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
                cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
                connection.commit()
            else:
                # do work to get result_text
                cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
                cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
                connection.commit()
        except:
            try: 
                # Alternate method of work to get result_text
                cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
                cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
                connection.commit()
            except:
                # Since the job can't be done, tell the database
                cmd = "UPDATE table SET text = %s, hascontent = 0 WHERE name = %s"
                cursor.execute(cmd, ( "NO CONTENT", file_["name"]) )
                connection.commit()
    
    for file in f_arr:
        convertF(file)
    
  • TheNationalSarcasmSociety
    TheNationalSarcasmSociety over 7 years
    Thank you, I'll make those fixes. The first part of the if loop is necessary because I want to avoid re-doing the "work" to get the content for the database, not make sure that the record already exists before update.
  • e4c5
    e4c5 over 7 years
    Didn't understand your comment
  • TheNationalSarcasmSociety
    TheNationalSarcasmSociety over 7 years
    I just meant that the if loop isn't only for checking if the value exists before I update - I've got other, necesarry logic in there that I cleaned out when I tried to get rid of code that wasn't important to the question. Sorry about that.
  • TheNationalSarcasmSociety
    TheNationalSarcasmSociety over 7 years
    I have updated my code by making the connection object an argument of the convertF method, and then making and closing the cursor in the method. In the for loop, I have enclosed convertF in a try block so that I may catch the OperationalError, close the old connection, make a new connection, and re-try to convertF. When I re-try the SQL statement where the pipe broke, it breaks again. Could it be the statement itself? Any ideas? Also, I think that removing most of my code has made it hard for you to answer this question, so I put the whole thing here
  • e4c5
    e4c5 over 7 years
    can you actually post that as a new question please
  • TheNationalSarcasmSociety
    TheNationalSarcasmSociety over 7 years
    Yes. Do you recommend that I use stripped code, or full code?
  • e4c5
    e4c5 over 7 years
    Well it's hard to say but do include a bit more than you have included here
  • TheNationalSarcasmSociety
    TheNationalSarcasmSociety over 7 years
    I did more research and discovered my issue. One of the documents I'm working with is unbelievably large for what it is, and so my query exceeds the max_allowed_packet
  • Halee
    Halee about 5 years
    Using the pd.to_sql() function, and passing the param chunksize is one way to do this which worked for me. chunksize : int, optional Rows will be written in batches of this size at a time. By default, all rows will be written at once.
  • dlink
    dlink almost 3 years
    "remedy is to retry the query or to reconnect and retry" Helpful. Thanks @e4c5. I did this by implementing a try / Except pymysql.err.OperationalError, which simply reconnects and gets a new cursor and then tries the query again.