Concurrent use of same JDBC connection by multiple threads

12,021

Solution 1

I ran the following set of tests using a AWS RDS Postgres database, and Java 11:

  1. Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string

  2. Pick a random 5 character string, and search for partial-matches of this string, in the above table

  3. Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets

  4. Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized

  5. Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread

To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.

Solution 2

Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.

In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.

Share:
12,021
RvPr
Author by

RvPr

Updated on June 26, 2022

Comments

  • RvPr
    RvPr almost 2 years

    I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.

    • Will the outcome be functionally correct?

    • What are the performance implications?

    • Will thread A have to wait for thread B to be completely done with its query?

    • Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?


    I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.

    I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?

    • cool
      cool over 8 years
      connection pooling is much more than just a set of connections. You should really read about what a connection pool provide The parameters here can provide some hint commons.apache.org/proper/commons-dbcp/configuration.html
    • RvPr
      RvPr over 8 years
      I had browsed through the config and other API on the interface, and none of them interest me for my use-case. Hence why I ended up writing my own implementation, which came out to ~10 lines of code, does exactly what I need, minimizes synchronization overhead, and saves me the trouble of having to add connection.close() in a million places.
    • Mark Rotteveel
      Mark Rotteveel over 8 years
      Multiplexing multiple threads on a single connections is a really bad idea. Just use a connection pool, check out a connection and return it to the pool (close it) when you are done with it. Have you - for example - considered what happens if one thread wants to commit and the other wants to rollback? You either lose data or the wrong data has been committed. That is a real bad place to be. Also, but not using Connection.close(), you make it really unclear to others what you are doing.
    • Marko Zajc
      Marko Zajc over 5 years
      Tried doing that with PostgreSQL 42.2.5 JDBC drivers and both threads got blocked.
  • RvPr
    RvPr over 8 years
    The JDBC Connection spec is thread safe, with respect to issuing and running queries. I'm more concerned about the performance implications really.
  • Peter
    Peter over 8 years
    I assume thread safe in that context means all its methods are thread safe. So 2 calls by different threads won't interfere, but many calls that rely on a specific order might fail (like rollback() for example).
  • RvPr
    RvPr over 8 years
    I get your point about there being corner cases, if you're using the full range of functionality in Connection. However, I'm not. I have in fact set up a wrapper around Connection, that only allows executing queries, and that's it. I'm more concerned about the performance implications of executing multiple queries on the same connection.