Postgres: How to fire multiple queries in same time?

11,118

Solution 1

I've got a sure answer for this question - IF you will share with us what your ab workout is!!! I'm getting fat by the minute and I need answers myself...

OK I'll answer anyway.

If you are updating one table, on one database server, in 40 minutes 'single threaded' and in 10 minutes with 4 threads, the bottleneck is not the database server; otherwise, it would get bogged down in I/O. If you are executing a bunch of UPDATES, one call per record, the network round-trip time is killing you.

I'm pretty sure this is the case and not that it's either an I/O bottleneck on the DB or the possibility that procedure_name(paramns); is taking a long time. (If that were the procedure taking 2-10 seconds it would take like 2500 min to do 30K records). The reason I am sure is that starting 4 concurrent processed cuts the time in 1/4. So especially it is not an i/o issue on the DB server.

This might be the one excuse for putting business logic in an SP on the server. Optimization unfortunately means breaking the rules. The consequence is difficult maintenance. but, duh!!

However, the best solution would be to get this set up to use 'bulk update' queries. That might mean you have to take several strange and unintuitive steps such as this:

  • This will require a lot of modfication if multiple users can run it concurrently.
  • refactor the system so procedure_name(paramns) can get all the data it needs to process all records via a select statement. May need to use creative joins. If it's an SP of course now you are moving the logic to the client.
  • Use that have the program create an XML or other importable flat file format with the PK of the record to update, and the new field value or values. Write all the updates to this file instead of executing them on the DB.
  • have a temp table on the database that matches the layout of this flat file
  • run an import on the database - clear the temp table and import the file
  • do an update of a join of the temp table and the table to be updated, e.g., UPDATE mytbl, mytemp WHERE myPK=mytempPK SET myval=mytempnewval (use the right join syntax of course).
  • You can try some of these things 'by hand' first before you bother coding, to see if it's worth the speed increase.
  • If possible, you can still put this all in an SP!

I'm not making any guarantees, especially as I look down at my ever-fattening belly, but, this has the potential to melt your update job down to under a minute.

Solution 2

It is possible to update multiple rows at once. Below an example in postgres:

UPDATE
    table_name
SET
    column_name = temp.column_name 
FROM
    (VALUES
        (<id1>, <value1>),
        (<id2>, <value2>),
        (<id3>, <value3>)
    ) AS temp("id", "column_name")
WHERE
    table_name.id = temp.id
Share:
11,118
canni
Author by

canni

Just another salsa dancer ;)

Updated on June 05, 2022

Comments

  • canni
    canni almost 2 years

    I have one procedure which updates record values, and i want to fire it up against all records in table (over 30k records), procedure execution time is from 2 up to 10 seconds, because it depends on network load.

    Now i'm doing UPDATE table SET field = procedure_name(paramns); but with that amount of records it takes up to 40 min to process all table.

    Now im using 4 different connections witch fork to background and fires query with WHERE clause set to iterate over modulo of row id's to speed this up, ( WHERE id_field % 4 = ) and this works well and cuts down table populate to ~10 mins.

    But i want to avoid using cron, shell jobs and multiple connections for this, i know that it can be done with libpq, but is there a way to fire up a query (4 different non-blocking queries) and do not wait till it ends execution, within single connection?

    Or if anyone can point me out to some clues on how to write that function, using postgres internals, or simply in C and bound it as a stored procedure?

    Cheers Darius