Continuing a transaction after primary key violation error

10,202

Solution 1

You can also use SAVEPOINTs in a transaction.

Pythonish pseudocode is illustrate from the application side:

database.execute("BEGIN")
foreach data_row in input_data_dictionary:
    database.execute("SAVEPOINT bulk_savepoint")
    try:
        database.execute("INSERT", table, data_row)
    except:
        database.execute("ROLLBACK TO SAVEPOINT bulk_savepoint")
        log_error(data_row)
        error_count = error_count + 1
    else:
        database.execute("RELEASE SAVEPOINT bulk_savepoint")

if error_count > error_threshold:
    database.execute("ROLLBACK")
else:
    database.execute("COMMIT")

Edit: Here's an actual example of this in action in psql based on a slight variation of the example in the documentation (SQL statements prefixed by ">"):

> CREATE TABLE table1 (test_field INTEGER NOT NULL PRIMARY KEY);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "table1_pkey" for table "table1"
CREATE TABLE

> BEGIN;
BEGIN
> INSERT INTO table1 VALUES (1);
INSERT 0 1
> SAVEPOINT my_savepoint;
SAVEPOINT
> INSERT INTO table1 VALUES (1);
ERROR:  duplicate key value violates unique constraint "table1_pkey"
> ROLLBACK TO SAVEPOINT my_savepoint;
ROLLBACK
> INSERT INTO table1 VALUES (3);
INSERT 0 1
> COMMIT;
COMMIT
> SELECT * FROM table1;  
 test_field 
------------
          1
          3
(2 rows)

Note that the value 3 was inserted after the error, but still inside the same transaction!

The documentation for SAVEPOINT is at http://www.postgresql.org/docs/8.4/static/sql-savepoint.html.

Solution 2

I would use a stored procedure to catch the exceptions on your unique violations. Example:

CREATE OR REPLACE FUNCTION my_insert(i_foo text, i_bar text)
  RETURNS boolean LANGUAGE plpgsql AS
$BODY$
begin   
    insert into foo(x, y) values(i_foo, i_bar);
    exception
        when unique_violation THEN -- nothing

    return true;
end;
$BODY$;

SELECT my_insert('value 1','another value');

Solution 3

You can do a rollback to the transaction or a rollback to a save point just before the code that raises the exception (cr is the cursor):

name = uuid.uuid1().hex
cr.execute('SAVEPOINT "%s"' % name)
try:
    # your failing query goes here
except Exception:
    cr.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
    # your alternative code goes here 
else:
    cr.execute('RELEASE SAVEPOINT "%s"' % name)

This code assumes there is running transaction, otherwise you would not receive that error message.

Django postgresql backend creates cursors directly from psycopg. Maybe in the future they make a proxy class for the Django cursor, similar to the cursor of odoo. They extend the cursor with the following code (self is the cursor):

@contextmanager
@check
def savepoint(self):
    """context manager entering in a new savepoint"""
    name = uuid.uuid1().hex
    self.execute('SAVEPOINT "%s"' % name)
    try:
        yield
    except Exception:
        self.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
        raise
    else:
        self.execute('RELEASE SAVEPOINT "%s"' % name)

That way the context makes your code easier, it will be:

try:
    with cr.savepoint():
        # your failing query goes here
except Exception:
    # your alternative code goes here 

and the code is more readable, because the transaction stuff is not there.

Share:
10,202
John
Author by

John

Updated on July 18, 2022

Comments

  • John
    John almost 2 years

    I am doing a bulk insert of records into a database from a log file. Occasionally (~1 row out of every thousand) one of the rows violates the primary key and causes the transaction to fail. Currently, the user has to manually go through the file that caused the failure and remove the offending row before attempting to re-import. Given that there are hundreds of these files to import it is impractical.

    My question: How can I skip the insertion of records that will violate the primary key constraint, without having to do a SELECT statement before each row to see if it already exists?

    Note: I am aware of the very similar question #1054695, but it appears to be a SQL Server specific answer and I am using PostgreSQL (importing via Python/psycopg2).

  • John
    John about 14 years
    Can you elaborate on what you mean by SSIS?
  • dburges
    dburges about 14 years
    SSIS is the data import tool that comes with SQL Server. I didn't catch that you are using postgre. It can still do the job for postgre but I'm not sure how you would get it as I don't think it comes with the free version of SQL Server.
  • Guru
    Guru about 14 years
    It is always better to log your exceptions.. you can modify the exception blog to log it and still continue.
  • Frank Heikens
    Frank Heikens about 14 years
    That's not going to work, when an error happens, the transaction is aborted and rollbacked. You need an exception handler inside the database. Query failed: ERROR: current transaction is aborted, commands ignored until end of transaction block
  • Frank Heikens
    Frank Heikens about 14 years
    You can let the function log the exceptions, no problem.
  • Matthew Wood
    Matthew Wood about 14 years
    Yes it will. That's the whole point of SAVEPOINTs. I've edited my answer in order to give a concrete example.
  • Frank Heikens
    Frank Heikens about 14 years
    ----edit---- Sorry, I was wrong... shame on me ;) It works fine, you're right.
  • John
    John about 14 years
    Thanks. However, I am adding millions of rows in a single transaction - what are the performance implications of adding millions of savepoints during a transaction (even if it is overwritten each successful row)?
  • Matthew Wood
    Matthew Wood about 14 years
    I'm not sure about the performance impact. If you are already running this without the savepoints, I would think it wouldn't be too difficult to add them in and test. Even if there is a performance impact per insert, you can load the data in "blocks" of records to minimize savepoints, only rolling back and handling in detail the blocks that fail. Also, the documentation states that the savepoints with the same name are not destroyed, only masked by the more recent one, so you might want to add a RELEASE SAVEPOINT <name> after the INSERT in the try block in my example.
  • Jeff
    Jeff almost 10 years
    @MatthewWood, What if you anticipate the number of errors to be more than the number of inserts?
  • Matthew Wood
    Matthew Wood almost 10 years
    @Jeff, If it were me, I would load the entire new file into a temporary table then do a single insert of all records from the temp table where the PK does not exist in the target table. There are several ways this can be done (NOT IN, LEFT JOIN, EXCEPT), so an efficient method should be possible depending on the volume of data.