Spring Data JPA: Batch insert for nested entities
Make sure to configure Hibernate batch-related properties properly:
<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>
The point is that successive statements can be batched if they manipulate the same table. If there comes the statement doing insert to another table, the previous batch construction must be interrupted and executed before that statement. With the hibernate.order_inserts
property you are giving permission to Hibernate to reorder inserts before constructing batch statements (hibernate.order_updates
has the same effect for update statements).
jdbc.batch_size
is the maximum batch size that Hibernate will use. Try and analyze different values and pick one that shows best performance in your use cases.
Note that batching of insert statements is disabled if IDENTITY
id generator is used.
Specific to MySQL, you have to specify rewriteBatchedStatements=true
as part of the connection URL. To make sure that batching is working as expected, add profileSQL=true
to inspect the SQL the driver sends to the database. More details here.
If your entities are versioned (for optimistic locking purposes), then in order to utilize batch updates (doesn't impact inserts) you will have to turn on also:
<property name="hibernate.jdbc.batch_versioned_data">true</property>
With this property you tell Hibernate that the JDBC driver is capable to return the correct count of affected rows when executing batch update (needed to perform the version check). You have to check whether this works properly for your database/jdbc driver. For example, it does not work in Oracle 11 and older Oracle versions.
You may also want to flush and clear the persistence context after each batch to release memory, otherwise all of the managed objects remain in the persistence context until it is closed.
Also, you may find this blog useful as it nicely explains the details of Hibernate batching mechanism.
Related videos on Youtube
Ahatius
Updated on October 22, 2022Comments
-
Ahatius over 1 year
I have a test case where I need to persist 100'000 entity instances into the database. The code I'm currently using does this, but it takes up to 40 seconds until all the data is persisted in the database. The data is read from a JSON file which is about 15 MB in size.
Now I had already implemented a batch insert method in a custom repository before for another project. However, in that case I had a lot of top level entities to persist, with only a few nested entities.
In my current case I have 5
Job
entities that contain a List of about ~30JobDetail
entities. OneJobDetail
contains between 850 and 1100JobEnvelope
entities.When writing to the database I commit the List of
Job
entities with the defaultsave(Iterable<Job> jobs)
interface method. All nested entities have the CascadeTypePERSIST
. Each entity has it's own table.The usual way to enable batch inserts would be to implement a custom method like
saveBatch
that flushes every once in a while. But my problem in this case are theJobEnvelope
entities. I don't persist them with aJobEnvelope
repository, instead I let the repository of theJob
entity handle it. I'm using MariaDB as database server.So my question boils down to the following: How can I make the
JobRepository
insert it's nested entities in batches?These are my 3 entites in question:
Job
@Entity public class Job { @Id @GeneratedValue private int jobId; @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "job") @JsonManagedReference private Collection<JobDetail> jobDetails; }
JobDetail
@Entity public class JobDetail { @Id @GeneratedValue private int jobDetailId; @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST) @JoinColumn(name = "jobId") @JsonBackReference private Job job; @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "jobDetail") @JsonManagedReference private List<JobEnvelope> jobEnvelopes; }
JobEnvelope
@Entity public class JobEnvelope { @Id @GeneratedValue private int jobEnvelopeId; @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST) @JoinColumn(name = "jobDetailId") private JobDetail jobDetail; private double weight; }
-
Ahatius about 8 yearsThank you very much for your detailled response. So it's basically not possible to do batch inserts on entities that use the
@GeneratedValue
annotation? -
Dragan Bozanovic about 8 yearsIt is possible, only it is not possible for
IDENTITY
id generator. Works for any other id generator. -
Ahatius about 8 yearsAh, I see. It was set to
AUTO
,SEQUENCE
is not supported by MySQL, so I'm currently looking intoTABLE
generation. Guess the automatic mode selected theIDENTITY
method since there was no table for sequences and the other one was not supported. Will report back. -
Dragan Bozanovic about 8 yearsQuite possible, since
native
is default I think if you specify@GeneratedValue
only, and it first checks ifIDENTITY
is supported by the database. -
Ahatius about 8 yearsHoly moly, thanks a lot - did work wonders. It only takes 5 instead of 40 seconds to insert those 100'000 entries :)