Spring data JPA batch insert is very slow
I would point one more thing. The problem could be not only hibernate but DB.
When you insert 700k objects in one transaction it could be stored in DB's rollback segment waiting for the transaction commit.
If possible split the logic to have commits in the middle.
Create 1k sized sublists from the main list, save the sublists and commit after each sublist saving.
Related videos on Youtube
Comments
-
Akshay Lokur almost 2 years
I am trying to read Excel file with 700K+ records and batch insert those in MySQL database table.
Please note, Excel parsing is fast and I can get my entity objects in an
ArrayList
within 50 seconds or so.I am using Spring Boot and Spring Data JPA.
Below is my partial
application.properties
file:hibernate.jdbc.batch_size=1000 spring.jpa.hibernate.use-new-id-generator-mappings=true
and my partial
Entity class
:@Entity @Table(name = "WHT_APPS", schema = "TEST") public class WHTApps { @Id @TableGenerator(name = "whtAppsGen", table = "ID_GEN", pkColumnName = "GEN_KEY", valueColumnName = "GEN_VAL") @GeneratedValue(strategy = GenerationType.TABLE, generator = "whtAppsGen") private Long id; @Column(name = "VENDOR_CODE") private int vendorCode; . . . .
Below is my
DAO
:@Repository @Transactional public class JapanWHTDaoImpl implements JapanWHTDao { @Autowired JapanWHTAppsRepository appsRepo; @Override public void storeApps(List<WHTApps> whtAppsList) { appsRepo.save(whtAppsList); }
and below is
Repository
class:@Transactional public interface JapanWHTAppsRepository extends JpaRepository<WHTApps, Long> { }
Can someone please enlighten me as to what I am doing incorrect here?
EDIT:
Process does not finish and throws error eventually:-
2017-08-15 15:15:24.516 WARN 14710 --- [tp1413491716-17] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: 08S01 2017-08-15 15:15:24.516 ERROR 14710 --- [tp1413491716-17] o.h.engine.jdbc.spi.SqlExceptionHelper : Communications link failure The last packet successfully received from the server was 107,472 milliseconds ago. The last packet sent successfully to the server was 107,472 milliseconds ago. 2017-08-15 15:15:24.518 INFO 14710 --- [tp1413491716-17] o.h.e.j.b.internal.AbstractBatchImpl : HHH000010: On release of batch it still contained JDBC statements 2017-08-15 15:15:24.525 WARN 14710 --- [tp1413491716-17] c.m.v.c3p0.impl.DefaultConnectionTester : SQL State '08007' of Exception tested by statusOnException() implies that the database is invalid, and the pool should refill itself with fresh Connections. com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_131] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_131] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_131] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_131] at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) ~[mysql-connector-java-5.1.43.jar:5.1.43] . . . . 2017-08-15 15:15:24.526 WARN 14710 --- [tp1413491716-17] c.m.v2.c3p0.impl.NewPooledConnection : [c3p0] A PooledConnection that has already signalled a Connection error is still in use! 2017-08-15 15:15:24.527 WARN 14710 --- [tp1413491716-17] c.m.v2.c3p0.impl.NewPooledConnection : [c3p0] Another error has occurred [ com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. ] which will not be reported to listeners! com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_131]
Thanks
-
JB Nizet over 6 yearsIt's right there, in the TOC, under "batching" or "batch inserts". Ctrl-F is your friend. docs.jboss.org/hibernate/orm/5.2/userguide/html_single/…
-
M. Deinum over 6 yearsYou are saving 700K records in a single transaction. What happens is 1 record is added to the first level cache. Then another one, which first check if there is a dirty record in the case. Now with 1 or 10 records that is fast, now imagine that with 10000 records or 100000 or 700000. You shouldn't store 700000 records at once with hibernate. You should create chunks of data to persist ideally the same size as your batch size. See vladmihalcea.com/2017/04/25/…
-
-
Akshay Lokur over 6 yearsI can see that process is not finished and it throws error eventually, please see my updated question. May be I will try out what you suggested...
-
StanislavL over 6 yearsCould be connection timeout as well. Try first with a small list e.g. 1k whether it works
-
Akshay Lokur over 6 yearsTo store around 1200 records in database time taken is ~7 seconds
-
StanislavL over 6 years7*700 seconds. What is your default DB connection timeout? Anyway it's better to split the process with chunks insert/commits
-
Akshay Lokur over 6 yearsIt is really slow... I am using @GeneratedValue(strategy = GenerationType.TABLE) could that be cause behind slowness?
-
Akshay Lokur over 6 yearsAfter splitting the records into chunks of 1500 records and having batch_size of 1500, records are getting inserted into DB in chunks; but still it is very slow. Whole 700K records might take ~50 to 60 mins to complete. Is that because I am using @GeneratedValue(strategy = GenerationType.TABLE)?
-
StanislavL over 6 yearsI can't say whether it depends on strategy. Try to get rid of the @GeneratedValue and see what is the result. Do you have any triggers on the DB?
-
Akshay Lokur over 6 yearsYeah tried without @GeneratedValue, performance more or less remains same. I don't have triggers in DB. Now checking with 2 threads (not sure it will help)
-
Akshay Lokur over 6 yearsTried with two threads, little bit improvement in performance [1 thread = ~62mins AND 2 threads = ~43mins for 700K+ records]. However more than 2 threads do not work and result in weird "Communications link failure" error.
-
StanislavL over 6 years2 more ideas. Could you try with pure SQL? Measure importing 1k records to see whether it's hibernate problem or DB problem. Split to a smaller chunks