Improve JDBC Performance

java sql multithreading jdbc

28,295

Solution 1

To improve JDBC performance for your scenario you can apply some modifications.

As you will see, all these modifications can significantly speed your task.

1. Using batch operations.

You can read your big query and store results in some kind of buffer. And only when buffer is full you should run subquery for all data collected in buffer. This significantly reduces number of SQL statements to execute.

static final int BATCH_SIZE = 1000; 
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);

while (rs.hasNext()) {

  MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
  buffer.add( record );

  if (buffer.size() == BATCH_SIZE) {
    processBatch( buffer );
  }  
}

void processBatch( List<MyData> buffer ) {

  String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
  stat1.executeQuery(sql);  // query for all IDs in buffer
  while(stat1.hasNext()) { ... }
  ...  
}

2. Using efficient maps to store content from many selects.

If your records are no so big you can store them all at once event for 4 mln table.

I used this approach many times for night processes (with no normal users). Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).

To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc) which is much better that java standard library maps.

final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);

// query 1
while (rs.hasNext()) {
  MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
  map.put(record.getId(), record);
}

// query 2
while (rs.hasNext()) {
  int id = rs.getInt(1);   // my data id
  String x = rs.getString(...);
  int y = rs.getInt(...);

  MyData record = map.get(id);
  record.add( new MyDetail(x,y) );
}

// query 3
// same pattern as query 2

After this you have map filled with all data collected. Probably with a lot of memory allocated. This is why you can use that method only if you hava such resources.

Another topic is how to write MyData and MyDetail classes to be as small as possible. You can use some tricks:

storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
storing Date objects as integer (yymmdd)
calling str.intern() for each string fetched from DB

3. Transactions

If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions. This is too much for most database configurations. Use approach 1) and commit transaction for each batch. On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.

If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.

4. Jdbc fetch size.

When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection. This reduces number of physical hits to database socket and speeds your process.

Example:

// jdbc
statement.setFetchSize(500);

// spring     
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);

Here you can find some benchmarks and patterns for using fetch size:

http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html

5. PreparedStatement

Use PreparedStatement rather than Statement.

6. Number of sql statements.

Always try to minimize number of sql statements you send to database.

Solution 2

Try this resultSet.setFetchSize(100);

while(resultSet.next) { ... }

The parameter is the number of rows that should be retrieved from the database in each roundtrip

28,295

Author by

user2133404

Updated on March 15, 2020

Comments

user2133404 about 4 years
I am executing the following set of statements in my java application. It connects to a oracle database.
```
stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
    obj1.setAttr1(rs.getString(1));
    obj1.setAttr2(rs.getString(1));
    obj1.setAttr3(rs.getString(1));
    obj1.setAttr4(rs.getString(1));

    ResultSet rs1 = stat1.executeQuery(SMALLQ1);
    while(rs1.next()) {
       obj1.setAttr5(rs1.getString(1));
    }

     ResultSet rs2 = stat1.executeQuery(SMALLQ2);
    while(rs2.next()) {
       obj1.setAttr6(rs2.getString(1));

    }
         .
         .
         .
 LinkedBlockingqueue.add(obj1);
 }
 //all staements and connections close
```
The BIGQUERY returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.

My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.

Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?

PS : I am unable to post the query here, but I am assured that all queries are optimized.
a_horse_with_no_name over 9 years

I strongly disagree with #3 - in Oracle it is almost always faster to do everything in a single transaction rather than splitting it up into smaller chunks. Oracle is not SQL Server.
przemek hertel over 9 years

@a_horse_with_no_name - yes, you're right, most tasks Oracle do better in one TX, but there are some cases that task is too big for current REDO LOG configuration, and i was talking about such case. In that cases I usually split task to smaller TXs with logic marker. Anyway I am Oracle enthusiast :)
a_horse_with_no_name over 9 years

Restrictions on the size of the redo log have nothing do do with performance. And as the whole problem is (only) about queries, not DML, redo logs will almost certainly not be a problem here.
przemek hertel over 9 years

I was writing more generally - if only queries than ok (maybe read-only tx). But if DML - the size of TX does matter as all TX redo records have to be stored.
Adrian Shum almost 9 years

I am mostly agree on the optimizations in this answer. However, the very first thing we should do, before doing any of these optimization is to do some basic profiling to find out what are the time-taking logics. Is it the query itself not written effectively? Is it fetching that took most time? etc.
przemek hertel almost 9 years

@AdrianShum you're right, profiling gives you diagnosis, and after that you can apply appropiate medicine, i agree
pedram bashiri about 6 years

Among all the suggestions, I think setting the right fetch size would have the biggest performance impact