JPA / Hibernate is very slow when fetching data
In the comments you point out that your @Joincolumn
are or FetchType.EAGER
. This is very aggressive and your problem.
Eager
means Hibernate will JOIN FETCH
all said columns in your query, parse and load all the records to dispatch the data in new instances of your entities. You may know that, if you join table A and B, the result will be a huge A.*, B.*
with A x B
records and A
repeated many times. This is what's going on with Eager
. That can escalate very quickly, especially with many Eager
columns.
Switching to Lazy
tells Hibernate to not load the data. It just prepares a Proxy
object which will call a separate SELECT
only when required (if your transaction is still open).
You can always force a FETCH
manually in your HQL with JOIN FETCH table
. This is the preferable way.
Sayak Banerjee
Updated on June 20, 2022Comments
-
Sayak Banerjee almost 2 years
I have a DAO that uses JPA/Hibernate to fetch data from SQL Server 2008 R2. In my specific usecase, i am doing a simple SELECT query that returns about 100000 records, but takes more than 35 minutes to do so.
I created a basic JDBC connection by manually loading the sql server driver and the same query returned 100k records in 15 seconds. So something is very wrong in my configuration.
Here is my springDataContext.xml:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:jpa="http://www.springframework.org/schema/data/jpa" xmlns:context="http://www.springframework.org/schema/context" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/data/jpa http://www.springframework.org/schema/data/jpa/spring-jpa-1.3.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.1.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd"> <bean id="jpaVendorAdapter" class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter"/> <bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean"> <property name="dataSource" ref="myDataSource"/> <property name="jpaVendorAdapter" ref="jpaVendorAdapter"/> <property name="persistenceXmlLocation" value="classpath:persistence.xml"/> <property name="jpaProperties"> <props> <prop key="hibernate.dialect">org.hibernate.dialect.SQLServer2008Dialect</prop> <prop key="hibernate.ejb.naming_strategy">org.hibernate.cfg.ImprovedNamingStrategy</prop> <prop key="hibernate.format_sql">true</prop> <prop key="hibernate.hbm2ddl.auto">none</prop> <prop key="hibernate.use_sql_comments">false</prop> <prop key="hibernate.show_sql">${hibernate.show_sql:false}</prop> <prop key="jadira.usertype.autoRegisterUserTypes">true</prop> <prop key="jadira.usertype.javaZone">America/Chicago</prop> <prop key="jadira.usertype.databaseZone">America/Chicago</prop> <prop key="jadira.usertype.useJdbc42Apis">false</prop> </props> </property> </bean> <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager"> <property name="entityManagerFactory" ref="entityManagerFactory"/> </bean> <tx:annotation-driven/> <jpa:repositories base-package="com.mycompany.foo"/> <context:component-scan base-package="com.mycompany.foo.impl" /> </beans>
The bean
myDataSource
is provided by whatever app consumes the DAO, which is defined like so:<bean id="myDataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource"> <property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver"/> <property name="url" value="jdbc:sqlserver://mysqlhost.mycompany.com:1433;database=MYDB"/> <property name="username" value="username"/> <property name="password" value="chucknorris"/> </bean>
I have a complex query wherein I am setting
fetchSize
:package com.mycompany.foo.impl; import com.mycompany.foo.RecurringLoanPaymentAccountsDao; import org.hibernate.Query; import org.hibernate.ScrollMode; import org.hibernate.ScrollableResults; import org.hibernate.Session; import org.joda.time.DateTime; import javax.inject.Named; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; @Named public class FooDaoImpl implements FooDao { @PersistenceContext private EntityManager entityManager; public ScrollableResults getData(int chunkSize, DateTime tomorrow, DateTime anotherDate) { Session session = entityManager.unwrap(Session.class); Query query = session.createQuery( "from MyAccountTable as account " + // bunch of left joins (I am using @JoinColumns in my models) "where account.some_date >= :tomorrow " + "and account.some_other_date < :anotherDate" // <snip snip> ); query.setParameter("tomorrow", tomorrow) .setParameter("anotherDate", anotherDate) .setFetchSize(chunkSize); return query.scroll(ScrollMode.FORWARD_ONLY); } }
I also switched to vanilla JPA and did
jpaQuery.getResultList()
, but that was equally slow.I can provide other relevant code if required. Any clue on what I'm doing wrong here?
Update 1: Add schema details
Unfortunately, I work for a bank, so I cannot share the exact code. But let me try to represent the relevant bits.
Here is my main table that I am querying:
@Entity @Table(name = "MY_TABLE") public class MyTable { @EmbeddedId private Id id = new Id(); @Column(name = "SOME_COL") private String someColumn; // other columns @OneToMany(fetch = FetchType.LAZY) @JoinColumns({ @JoinColumn(name = "ACCT_NBR", referencedColumnName = "ACCT_NBR", insertable = false, updatable = false), @JoinColumn(name = "CUST_NBR", referencedColumnName = "CUST_NBR", insertable = false, updatable = false) }) private List<ForeignTable> foreignTable; // getter and setter for properties public static class Id implements Serializable { @Column(name = "ACCT_NBR") private short accountNumber; @Column(name = "CUST_NBR") private short customerNumber; } }
Basically, every table has
ACCT_NBR
andCUST_NBR
columns (including the foreign table) which are unique when clustered together. So my join condition includes both of these.The model for the foreign table looks exactly the same (with and embedded ID like the main table above), of course with its own set of columns in addition to the
ACCT_NBR
andCUST_NBR
.Now i only care about 2 other columns in the foreign table apart from the ID columns mentioned above:
TYPE_ID
andACCT_BALANCE
.TYPE_ID
is what I wish to use in myLEFT JOIN
condition, so that the final SQL looks like this:LEFT JOIN FOREIGN_TABLE FRN ON MAIN.ACCT_NBR = FRN.ACCT_NBR AND MAIN.CUST_NBR = FRN.CUST_NBR AND FRN.TYPE_ID = <some_static_id>
I want the LEFT JOIN to yield NULL data when there is no match for this specific
TYPE_ID
.And in my
SELECT
claus, I selectFOREIGN_TABLE.ACCT_BALANCE
, which of course will be null if the left join above has no matching row.This method is invoked from a spring batch application's Tasklet, and I have a feeling that it gave a null pointer exception once the tasklet finished processing and therefore, the readonly transaction was closed.
-
Guillaume F. over 8 yearsCan you make sure your statement is readonly ? Hibernate is very bad at handling a lots of object in its session. Also, check if your JDBC connection has cursors enabled, by default JDBC copies everything in memory instead of using the database's cursor
-
Guillaume F. over 8 yearsYou should maybe consider JTDS as an alternative driver, too. The one provided by Microsoft has a few violent bugs, and is a lot slower then JTDS.
-
Sayak Banerjee over 8 yearsI have a @Transactional(readOnly = true) annotation on the method that calls this DAO method. How can I mark this statement as read-only? Do I need to mark the entity as read only by removing the setters?
-
Guillaume F. over 8 years
@Transactional(readOnly = true)
is correct. Hibernate shouldn't do dirty checks on your session objects. That will speed-up the query a lot. -
Sayak Banerjee over 8 yearsWell, unfortunately, I already have that in there. And it still is painfully slow. Let me look at JTDS. Being a .net developer, all this is very daunting for me :)
-
Guillaume F. over 8 yearsAre the joined columns on
FetchType.Lazy
orEager
? Eager is not recommended unless you really know what you're doing. -
Sayak Banerjee over 8 yearsIt is a @OneToMany with fetchtype.Eager. Let me try our lazy as well. If that doesn't make a difference, I'll look into JTDS.
-
Sayak Banerjee over 8 years@GuillaumeF. whoa, changing to lazy made a HUGE difference!! I am going to run an end to end test and confirm.
-
-
Sayak Banerjee over 8 yearsNow i have a different issue, I get a NullPointerException when I try to access the joined entity. And i cannot use JOIN FETCH either as I have a "with" claus (since I have to join the foreign table with one of the columns = some static value). Any ideas?
-
Guillaume F. over 8 yearsIt really depends on your schema. I am blind without seeing the actual code. Accessing a Lazy getter, if the transaction is open, should call a SELECT. A NullPointerException is unlikely to be the direct cause. Can you post more code in your question as an addendum?
-
Sayak Banerjee over 8 yearsI've added some details. I apologize that I cannot post the exact code as my company has rules against it. I've tried my best to explain the schema, but please let me know if I missed out something.
-
Guillaume F. over 8 yearsYou can use
LEFT JOIN FETCH FOREIGN_TABLE FRN
but you might get the same problem as with Eager (It depends how many Eager Columns you had before). || If that foreign join returns only one record, it should be fast. Just make sure your foreign table has a composite index with the three reference columns. || Otherwise solving the Lazy loading is a bit more difficult (especially with Spring Batch). -
Sayak Banerjee over 8 yearsProblem is, to have that TYPE_ID = <static_int>, I was using WITH in my HQL, which i can no longer use with LEFT JOIN FETCH. Let me try to figure out if I can somehow keep the transaction active. Otherwise, I'd be back to square one :(
-
Guillaume F. over 8 yearsNot really to square one, you identified the problem why your query is slow. You just don't have the right solution yet.
-
Sayak Banerjee over 8 yearsI have eliminated the join altogether. Thanks a lot for your help on this issue! Have a great new year ahead :)