Spring Data JPA HIbernate batch insert is slower

10,148

That because EntityManager don't persist data in database immediately. And when you call flush() data will be persisted. When you comment those lines, EntityManager flushes data depending on flush-mode parameter, calling flush directly you tell EntityManager execute queries in database.

Share:
10,148
Chrisma Daniel
Author by

Chrisma Daniel

Updated on June 04, 2022

Comments

  • Chrisma Daniel
    Chrisma Daniel almost 2 years

    I use Spring Data, Spring Boot, and Hibernate as JPA provider and I want to improve performance in bulk inserting.

    I refer to this link to use batch processing:

    http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch15.html

    This is my code and my application.properties for insert batching experiment.

    My service:

        @Value("${spring.jpa.properties.hibernate.jdbc.batch_size}")
        private int batchSize;
    
        @PersistenceContext
        private EntityManager em;
    
        @Override
        @Transactional(propagation = Propagation.REQUIRED)
        public SampleInfoJson getSampleInfoByCode(String code) {
    //        SampleInfo newSampleInfo = new SampleInfo();
    //        newSampleInfo.setId(5L);
    //        newSampleInfo.setCode("SMP-5");
    //        newSampleInfo.setSerialNumber(10L);
    //        sampleInfoDao.save(newSampleInfo);
            log.info("starting... inserting...");
            for (int i = 1; i <= 5000; i++) {
                SampleInfo newSampleInfo = new SampleInfo();
    //            Long id = (long)i + 4;
    //            newSampleInfo.setId(id);
                newSampleInfo.setCode("SMPN-" + i);
                newSampleInfo.setSerialNumber(10L + i);
    //            sampleInfoDao.save(newSampleInfo);
                em.persist(newSampleInfo);
                if(i%batchSize == 0){
                    log.info("flushing...");
                    em.flush();
                    em.clear();
                }
            }
    

    part of application.properties that related to batching:

    spring.jpa.properties.hibernate.jdbc.batch_size=100
    spring.jpa.properties.hibernate.cache.use_second_level_cache=false
    spring.jpa.properties.hibernate.order_inserts=true
    spring.jpa.properties.hibernate.order_updates=true
    

    Entity class:

    @Entity
    @Table(name = "sample_info")
    public class SampleInfo implements Serializable{
    
        private Long id;
        private String code;
        private Long serialNumber;
    
        @Id
        @GeneratedValue(
                strategy = GenerationType.SEQUENCE,
                generator = "sample_info_seq_gen"
        )
        @SequenceGenerator(
                name = "sample_info_seq_gen",
                sequenceName = "sample_info_seq",
                allocationSize = 1
        )
        @Column(name = "id")
        public Long getId() {
            return id;
        }
    
        public void setId(Long id) {
            this.id = id;
        }
    
        @Column(name = "code", nullable = false)
        public String getCode() {
            return code;
        }
    
        public void setCode(String code) {
            this.code = code;
        }
    
        @Column(name = "serial_number")
        public Long getSerialNumber() {
            return serialNumber;
        }
    
        public void setSerialNumber(Long serialNumber) {
            this.serialNumber = serialNumber;
        }
    }
    

    Running the service above batch inserting 5000 rows took 30 to 35 seconds to complete, but if comment these lines:

    if(i%batchSize == 0){
        log.info("flushing...");
        em.flush();
        em.clear();
    }
    

    inserting 5000 rows took only 5 to 7 seconds, faster than batch mode.

    Why is it slower when using batch mode?