Spring Data JPA HIbernate batch insert is slower
That because EntityManager don't persist data in database immediately. And when you call flush()
data will be persisted. When you comment those lines, EntityManager flushes data depending on flush-mode parameter, calling flush directly you tell EntityManager execute queries in database.
Chrisma Daniel
Updated on June 04, 2022Comments
-
Chrisma Daniel almost 2 years
I use Spring Data, Spring Boot, and Hibernate as JPA provider and I want to improve performance in bulk inserting.
I refer to this link to use batch processing:
http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch15.html
This is my code and my application.properties for insert batching experiment.
My service:
@Value("${spring.jpa.properties.hibernate.jdbc.batch_size}") private int batchSize; @PersistenceContext private EntityManager em; @Override @Transactional(propagation = Propagation.REQUIRED) public SampleInfoJson getSampleInfoByCode(String code) { // SampleInfo newSampleInfo = new SampleInfo(); // newSampleInfo.setId(5L); // newSampleInfo.setCode("SMP-5"); // newSampleInfo.setSerialNumber(10L); // sampleInfoDao.save(newSampleInfo); log.info("starting... inserting..."); for (int i = 1; i <= 5000; i++) { SampleInfo newSampleInfo = new SampleInfo(); // Long id = (long)i + 4; // newSampleInfo.setId(id); newSampleInfo.setCode("SMPN-" + i); newSampleInfo.setSerialNumber(10L + i); // sampleInfoDao.save(newSampleInfo); em.persist(newSampleInfo); if(i%batchSize == 0){ log.info("flushing..."); em.flush(); em.clear(); } }
part of application.properties that related to batching:
spring.jpa.properties.hibernate.jdbc.batch_size=100 spring.jpa.properties.hibernate.cache.use_second_level_cache=false spring.jpa.properties.hibernate.order_inserts=true spring.jpa.properties.hibernate.order_updates=true
Entity class:
@Entity @Table(name = "sample_info") public class SampleInfo implements Serializable{ private Long id; private String code; private Long serialNumber; @Id @GeneratedValue( strategy = GenerationType.SEQUENCE, generator = "sample_info_seq_gen" ) @SequenceGenerator( name = "sample_info_seq_gen", sequenceName = "sample_info_seq", allocationSize = 1 ) @Column(name = "id") public Long getId() { return id; } public void setId(Long id) { this.id = id; } @Column(name = "code", nullable = false) public String getCode() { return code; } public void setCode(String code) { this.code = code; } @Column(name = "serial_number") public Long getSerialNumber() { return serialNumber; } public void setSerialNumber(Long serialNumber) { this.serialNumber = serialNumber; } }
Running the service above batch inserting 5000 rows took 30 to 35 seconds to complete, but if comment these lines:
if(i%batchSize == 0){ log.info("flushing..."); em.flush(); em.clear(); }
inserting 5000 rows took only 5 to 7 seconds, faster than batch mode.
Why is it slower when using batch mode?