How can I do a batch insert into an Oracle database using Python?

54,488

Solution 1

Here's what I've come up with which appears to work well (but please comment if there's a way to improve this):

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = datetime.date(startYear + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])
        rows.append(row)

# insert all of the rows as a batch and commit
ip = '192.1.2.3' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)
connection.commit()
cursor.close()
connection.close()

Solution 2

Use Cursor.prepare() and Cursor.executemany().

From the cx_Oracle documentation:

Cursor.prepare(statement[, tag])

This can be used before a call to execute() to define the statement that will be executed. When this is done, the prepare phase will not be performed when the call to execute() is made with None or the same string object as the statement. [...]

Cursor.executemany(statement, parameters)

Prepare a statement for execution against a database and then execute it against all parameter mappings or sequences found in the sequence parameters. The statement is managed in the same way as the execute() method manages it.

Thus, using the above two functions, your code becomes:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [datetime.date(start_year + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.prepare(sql_insert)
cursor.executemany(None, dict_sequence)
connection.commit()

Also see Oracle's Mastering Oracle+Python series of articles.

Solution 3

fyi my test result:

I insert into 5000 rows. 3 columns per row.

  1. run insert 5000 times, it costs 1.24 minutes.
  2. run with executemany, it costs 0.125 seconds.
  3. run with a insert all code: it costs 4.08 minutes.

python code, which setup the sql like insert all into t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul...

The python code to setup this long sql, it cost 0.145329 seconds.

I test my code on a very old sun machine. cpu: 1415 MH.

in the third case, I checked the database side, the wait event is "SQL*Net more data from client". which means the server is waiting for more data from client.

The result of the third method is unbelievable for me without the test.

so the short suggestion from me is just to use executemany.

Solution 4

As one of the comments says, consider using INSERT ALL. Supposedly it'll be significantly faster than using executemany().

For example:

INSERT ALL
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
SELECT * FROM dual;

http://www.techonthenet.com/oracle/questions/insert_rows.php

Solution 5

I would create a large SQL insert statement using union:

insert into mytable(col1, col2, col3)
select a, b, c from dual union
select d, e, f from dual union
select g, h, i from dual

You can build the string in python and give it to oracle as one statement to execute.

Share:
54,488
James Adams
Author by

James Adams

Updated on May 26, 2020

Comments

  • James Adams
    James Adams almost 4 years

    I have some monthly weather data that I want to insert into an Oracle database table but I want to insert the corresponding records in a batch in order to be more efficient. Can anyone advise as to how I'd go about doing this in Python?

    For example let's say my table has four fields: a station ID, a date, and two value fields. The records are uniquely identified by the station ID and date fields (composite key). The values I'll have to insert for each station will be kept in a list with X number of full years worth of data, so for example if there are two years of values then the value lists will contain 24 values.

    I assume that below is the way I'd do this if I wanted to insert the records one at a time:

    connection_string = "scott/tiger@testdb"
    connection = cx_Oracle.Connection(connection_string)
    cursor = cx_Oracle.Cursor(connection)
    station_id = 'STATION_1'
    start_year = 2000
    
    temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
    precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
    number_of_years = len(temps) / 12
    for i in range(number_of_years):
        for j in range(12):
            # make a date for the first day of the month
            date_value = datetime.date(start_year + i, j + 1, 1)
            index = (i * 12) + j
            sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, %s, %s, %s)', (station_id, date_value, temps[index], precips[index]))
            cursor.execute(sql_insert)
    connection.commit()
    

    Is there a way to do what I'm doing above but in a way that performs a batch insert in order to increase efficiency? BTW my experience is with Java/JDBC/Hibernate so if someone can give an explanation/example which compares to the Java approach then it'd be especially helpful.

    EDIT: Perhaps I need to use cursor.executemany() as described here?

    Thanks in advance for any suggestions, comments, etc.

  • James Adams
    James Adams about 11 years
    Thanks for this. I've tried the above but it doesn't work due to the fact that executemany() only takes two arguments. And it seems that you still have to create a list of rows to use as the sequence parameters argument, as described in the section titled "Many at Once" here: oracle.com/technetwork/articles/dsl/… and the sections describing the use of executemany() here: oracle.com/technetwork/articles/dsl/…
  • James Adams
    James Adams about 11 years
    Also it's not clear how using a prepare() gains you anything since according to the cx_Oracle.Cursor.execute() documentation: "A reference to the statement will be retained by the cursor. If None or the same string object is passed in again, the cursor will execute that statement again without performing a prepare or rebinding and redefining." But I may be misunderstanding something here...
  • alldayremix
    alldayremix about 11 years
    Using prepare() doesn't gain you anything, it just makes your code clearer; the underlying C code for either method is the same. As for your first comment, I updated my answer to address that: executemany() takes a sequence of dictionaries as the second argument.
  • YoYo
    YoYo about 6 years
    This is a correct answer. Multiple inserts are handled with one roundtrip to the database. One issue - You probably want to do a UNION ALL instead of a regular UNION.
  • YoYo
    YoYo about 6 years
    This is a correct answer. Multiple inserts are handled with one roundtrip to the database.