Spring data JPA batch insert is very slow

10,799

I would point one more thing. The problem could be not only hibernate but DB.

When you insert 700k objects in one transaction it could be stored in DB's rollback segment waiting for the transaction commit.

If possible split the logic to have commits in the middle.

Create 1k sized sublists from the main list, save the sublists and commit after each sublist saving.

Share:
10,799

Related videos on Youtube

Akshay Lokur
Author by

Akshay Lokur

Loves Computers & programming!👨🏻‍💻

Updated on June 04, 2022

Comments

  • Akshay Lokur
    Akshay Lokur almost 2 years

    I am trying to read Excel file with 700K+ records and batch insert those in MySQL database table.

    Please note, Excel parsing is fast and I can get my entity objects in an ArrayList within 50 seconds or so.

    I am using Spring Boot and Spring Data JPA.

    Below is my partial application.properties file:

    hibernate.jdbc.batch_size=1000
    spring.jpa.hibernate.use-new-id-generator-mappings=true
    

    and my partial Entity class:

    @Entity
    @Table(name = "WHT_APPS", schema = "TEST")
    public class WHTApps {
    
        @Id
        @TableGenerator(name = "whtAppsGen", table = "ID_GEN", pkColumnName = "GEN_KEY", valueColumnName = "GEN_VAL")
        @GeneratedValue(strategy = GenerationType.TABLE, generator = "whtAppsGen")
        private Long id;
    
        @Column(name = "VENDOR_CODE")
        private int vendorCode;
        .
        .
        .
        .
    

    Below is my DAO:

    @Repository
    @Transactional
    public class JapanWHTDaoImpl implements JapanWHTDao {
    
        @Autowired
        JapanWHTAppsRepository appsRepo;
    
        @Override
        public void storeApps(List<WHTApps> whtAppsList) {
            appsRepo.save(whtAppsList);
    
        }
    

    and below is Repository class:

    @Transactional
    public interface JapanWHTAppsRepository extends JpaRepository<WHTApps, Long> {
    
    }
    

    Can someone please enlighten me as to what I am doing incorrect here?

    EDIT:

    Process does not finish and throws error eventually:-

    2017-08-15 15:15:24.516  WARN 14710 --- [tp1413491716-17] o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 0, SQLState: 08S01
    2017-08-15 15:15:24.516 ERROR 14710 --- [tp1413491716-17] o.h.engine.jdbc.spi.SqlExceptionHelper   : Communications link failure
    
    The last packet successfully received from the server was 107,472 milliseconds ago.  The last packet sent successfully to the server was 107,472 milliseconds ago.
    2017-08-15 15:15:24.518  INFO 14710 --- [tp1413491716-17] o.h.e.j.b.internal.AbstractBatchImpl     : HHH000010: On release of batch it still contained JDBC statements
    2017-08-15 15:15:24.525  WARN 14710 --- [tp1413491716-17] c.m.v.c3p0.impl.DefaultConnectionTester  : SQL State '08007' of Exception tested by statusOnException() implies that the database is invalid, and the pool should refill itself with fresh Connections.
    
    com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_131]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_131]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_131]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_131]
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) ~[mysql-connector-java-5.1.43.jar:5.1.43]
        .
        .
        .
        .
        2017-08-15 15:15:24.526  WARN 14710 --- [tp1413491716-17] c.m.v2.c3p0.impl.NewPooledConnection     : [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
    2017-08-15 15:15:24.527  WARN 14710 --- [tp1413491716-17] c.m.v2.c3p0.impl.NewPooledConnection     : [c3p0] Another error has occurred [ com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. ] which will not be reported to listeners!
    
    com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_131]
    

    Thanks

    • JB Nizet
      JB Nizet over 6 years
      It's right there, in the TOC, under "batching" or "batch inserts". Ctrl-F is your friend. docs.jboss.org/hibernate/orm/5.2/userguide/html_single/…
    • M. Deinum
      M. Deinum over 6 years
      You are saving 700K records in a single transaction. What happens is 1 record is added to the first level cache. Then another one, which first check if there is a dirty record in the case. Now with 1 or 10 records that is fast, now imagine that with 10000 records or 100000 or 700000. You shouldn't store 700000 records at once with hibernate. You should create chunks of data to persist ideally the same size as your batch size. See vladmihalcea.com/2017/04/25/…
  • Akshay Lokur
    Akshay Lokur over 6 years
    I can see that process is not finished and it throws error eventually, please see my updated question. May be I will try out what you suggested...
  • StanislavL
    StanislavL over 6 years
    Could be connection timeout as well. Try first with a small list e.g. 1k whether it works
  • Akshay Lokur
    Akshay Lokur over 6 years
    To store around 1200 records in database time taken is ~7 seconds
  • StanislavL
    StanislavL over 6 years
    7*700 seconds. What is your default DB connection timeout? Anyway it's better to split the process with chunks insert/commits
  • Akshay Lokur
    Akshay Lokur over 6 years
    It is really slow... I am using @GeneratedValue(strategy = GenerationType.TABLE) could that be cause behind slowness?
  • Akshay Lokur
    Akshay Lokur over 6 years
    After splitting the records into chunks of 1500 records and having batch_size of 1500, records are getting inserted into DB in chunks; but still it is very slow. Whole 700K records might take ~50 to 60 mins to complete. Is that because I am using @GeneratedValue(strategy = GenerationType.TABLE)?
  • StanislavL
    StanislavL over 6 years
    I can't say whether it depends on strategy. Try to get rid of the @GeneratedValue and see what is the result. Do you have any triggers on the DB?
  • Akshay Lokur
    Akshay Lokur over 6 years
    Yeah tried without @GeneratedValue, performance more or less remains same. I don't have triggers in DB. Now checking with 2 threads (not sure it will help)
  • Akshay Lokur
    Akshay Lokur over 6 years
    Tried with two threads, little bit improvement in performance [1 thread = ~62mins AND 2 threads = ~43mins for 700K+ records]. However more than 2 threads do not work and result in weird "Communications link failure" error.
  • StanislavL
    StanislavL over 6 years
    2 more ideas. Could you try with pure SQL? Measure importing 1k records to see whether it's hibernate problem or DB problem. Split to a smaller chunks