JPA with HIBERNATE insert very slow

17,521

Solution 1

To enable JDBC batching you should initialize the property hibernate.jdbc.batch_size to between 10 and 50 (int only)

hibernate.jdbc.batch_size=50

If it's still not as fast as expected, then I'd review the document above paying attention to NOTE(s) and section 4.1. Especially the NOTE that says, "Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator."

Solution 2

The problem

One of the major performance hits if you use Hibernate as your ORM is the way its "dirty check" is implemented (because without Byte Code Enhancement, which is standard in all JDO based ORMs and some others, dirty checking will always be an inefficient hack).

When flushing, a dirty check needs to be carried out on every object in the session to see if it is "dirty" i.e. one of its attributes has changed since it was loaded from the database. For all "dirty" (changed) objects Hibernate has to generate SQL updates to update the records that represent the dirty objects.

The Hibernate dirty check is notoriously slow on anything but a small number of objects because it needs to perform a "field by field" comparison between objects in memory with a snapshot taken when the object was first loaded from the database. The more objects, say, a HTTP request loads to display a page, then the more dirty checks will be required when commit is called.

Technical details of Hibernate's dirty checking mechanism

You can read more about Hibernate's dirty check mechanism implemented as a "field by field" comparison here:

How does Hibernate detect dirty state of an entity object?

How the problem is solved in other ORMs

A much more efficient mechanism used by some other ORMs is to use an automatically generated "dirty flag" attribute instead of the "field by field" comparison but this has traditionally only been available in ORMs (typically JDO based ORMs) that use and promote byte code enhancement or byte code 'weaving' as it is sometimes called eg., http://datanucleus.org and others

During byte code enhancement, by DataNucleus or any of the other ORMs supporting this feature, each entity class is enhanced to:

  • add an implicit dirty flag attribute
  • add the code to each of the setter methods in the class to automatically set the dirty flag when called

Then during a flush, only the dirty flag needs to be checked instead of performing a field by field comparison - which, as you can imagine, is orders of magnitude faster.

Other negative consequences of "field by field" dirty checking

The other innefficiency of the Hibernate dirty checking is the need to keep a snap shot of every loaded object in memory to avoid having to reload and check against the database during dirty checking.

Each object snap shot is a collection of all its fields.

In addition to the performance hit of the Hibernate dirty checking mechanism at flush time, this mechanism also burdens your app with the extra memory consumption and CPU usage associated with instantiating and initializing these snapshots of every single object that is loaded from the database - which can run into the thousands or millions depending on your application.

Hibernate has introduced byte code enhancement to address this but I have worked on many ORM persisted projects (both Hibernate and non Hibernate) and I am yet to see a Hibernate persisted project that uses that feature, possibly due to a number of reasons:

  • Hibernate has traditionally promoted its "no requirement for byte code enhancement" as a feature when people evaluate ORM technologies
  • Historical reliability issues with Hibernate's byte code enhancement implementation which is possibly not as mature as ORMs that have used and promoted byte code enhancement from the start
  • Some people are still scared of using byte code enhancement due to the promotion of an anti 'byte code enhancement' stance and the fear certain groups instilled in people regarding the use of byte code enhancement in the early days of ORMs

These days byte code enhancement is used for many different things - not just persistence. It has almost become mainstream.

Solution 3

Old topic but came across this today looking for something else. I had to post on this common problem that is unfortunately not very well understood and documented. For too long, Hibernate's documentation had only that brief note as posted above. Starting with version 5, there is a better but still thin explanation: https://docs.jboss.org/hibernate/orm/5.3/userguide/html_single/Hibernate_User_Guide.html#identifiers-generators-identity

The problem of slow insert of very large collection is simply poor choice of Id generation strategy:

@Id
@GeneratedValue(strategy=GenerationType.IDENTITY) 

When using Identity strategy, what need to be understood is that the database server creates the identity of the row, on the physical insert. Hibernate needs to know the assigned Id to have the object in persisted state, in session. The database generated Id is only known on the insert's response. Hibernate has NO choice but to perform 20000 individual inserts to be able to retrieve the generated Ids. It doesn't work with batch as far as I know, not with Sybase, not with MSSQL. That is why, regardless how hard you tried and with all the batching properties properly configured, Hibernate will do individual inserts.

The only solution that I know and have applied many time is to choose a client side Id generation strategy instead of the popular database side Identity strategy. I often used:

@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE)
@GenericGenerator(strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator")

There's a bit more configuration to get it to work but that the essence of it. When using a client side Id generation, Hibernate will set the Ids of all the 20000 objects before hitting the database. And with proper batching properties as seen in previous answers, Hibernate will do inserts in batch, as expected.

It is unfortunate that Identity generator so convenient and popular, it appears everywhere in all examples without clear explanation of the consequence of using this strategy. I read many so called "advance" Hibernate books and never seen one so far explaining the consequence of Identity on underlying insert performance on large data set.

Solution 4

Hibernate "default mode" IS slow.

Its advantages are Object Relational Mapping and some cache (but obviously it is not very useful for bulk insertion).

Use batch processing instead http://docs.jboss.org/hibernate/core/4.0/devguide/en-US/html/ch04.html

Share:
17,521
Yellow Duck
Author by

Yellow Duck

Updated on July 18, 2022

Comments

  • Yellow Duck
    Yellow Duck almost 2 years

    I am trying to insert some data to SQL Server 2008 R2 by using JAP and HIBERNATE. Everything "works" except for that it's very slow. To insert 20000 rows, it takes about 45 seconds, while a C# script takes about less than 1 second.

    Any veteran in this domain can offer some helps? I would appreciate it a lot.

    Update: got some great advices from the answers below, but it still doesn't work as expected. Speed is the same.

    Here is the updated persistence.xml:

    <persistence version="2.0"
    xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd">
    <persistence-unit name="ClusterPersist"
        transaction-type="RESOURCE_LOCAL">
        <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <class>cluster.data.persist.sqlserver.EventResult</class>
        <exclude-unlisted-classes>true</exclude-unlisted-classes>
        <properties>
            <property name="javax.persistence.jdbc.url"
                value="jdbc:sqlserver://MYSERVER:1433;databaseName=MYTABLE" />
            <property name="javax.persistence.jdbc.user" value="USER" />
            <property name="javax.persistence.jdbc.password" value="PASSWORD" />
            <property name="javax.persistence.jdbc.driver"
                value="com.microsoft.sqlserver.jdbc.SQLServerDriver" />
            <property name="hibernate.show_sql" value="flase" />
            <property name="hibernate.hbm2ddl.auto" value="update" />
    
            <property name="hibernate.connection.provider_class"
                value="org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider" />
    
            <property name="hibernate.c3p0.max_size" value="100" />
            <property name="hibernate.c3p0.min_size" value="0" />
            <property name="hibernate.c3p0.acquire_increment" value="1" />
            <property name="hibernate.c3p0.idle_test_period" value="300" />
            <property name="hibernate.c3p0.max_statements" value="0" />
            <property name="hibernate.c3p0.timeout" value="100" />
            <property name="hibernate.jdbc.batch_size" value="50" />
            <property name="hibernate.cache.use_second_level_cache" value="false" />
        </properties>
    </persistence-unit>
    

    And here is the updated code part:

    public static void writeToDB(String filePath) throws IOException {
    
        EntityManager entityManager = entityManagerFactory.createEntityManager();
        Session session = (Session) entityManager.getDelegate();
        Transaction tx = session.beginTransaction();
        int i = 0;
    
        URL filePathUrl = null;
        try {
            filePathUrl = new URL(filePath);
        } catch (MalformedURLException e) {
            filePathUrl = (new File(filePath)).toURI().toURL();
        }
    
        String line = null;
        BufferedReader stream = null;
    
        try {
            InputStream in = filePathUrl.openStream();
            stream = new BufferedReader(new InputStreamReader(in));
    
    
            // Read each line in the file
            MyRow myRow = new MyRow();
            while ((line = stream.readLine()) != null) {
                String[] splitted = line.split(",");
                int num1 = Integer.valueOf(splitted[1]);
                float num2= Float.valueOf(splitted[6]).intValue();
    
                myRow.setNum1(num1);
                myRow.setNum2(num2);
    
                session.save(myRow);
    
                if (i % 50 == 0) { 
                    session.flush();
                    session.clear();
                }
    
                i++;
    
            }
            tx.commit();
    
        } finally {
            if (stream != null)
                stream.close();
        }
        session.close();
    
    }
    

    Updated, here is the source for MyRow:

    @Entity
    @Table(name="MYTABLE")
    public class MyRow {    
    
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY) 
    private Long id;
    
    @Basic
    @Column(name = "Num1")
    private int Num1;
    
    @Basic
    @Column(name = "Num2")
    private float Num2;
    
    public Long getId() {
        return id;
    }
    
    public void setId(Long id) {
        this.id = id;
    }
    
    public float getNum1() {
        return Num1;
    }
    
    public void setNum1(float num1) {
        Num1 = num1;
    }
    
    public int getNum2() {
        return Num2;
    }
    
    public void setNum2(int num2) {
        Num2 = num2;
    }
    }
    
  • Yellow Duck
    Yellow Duck over 10 years
    Many thanks. Can you give a brief example on the persistence.xml settings for how to set it? One more extra layer of complexity is that I used JPA, so some settings may not be easily translated to the persistence.xml.
  • SJuan76
    SJuan76 over 10 years
    The link I provided states Before batch processing, enable JDBC batching. To enable JDBC batching, set the property hibernate.jdbc.batch_size to an integer between 10 and 50.
  • Yellow Duck
    Yellow Duck over 10 years
    Just tried it, and it is still the same speed. This is what I added in the persistence.xml "<property name="hibernate.jdbc.batch_size" value="50" />". And I modified the code accordingly as the example from the link. The code looks like: 'Session session = (Session) entityManager.getDelegate(); Transaction tx = session.beginTransaction();'
  • Yellow Duck
    Yellow Duck over 10 years
    Please see my comments to @SJuan76. There must be something missing here.
  • Elliott Frisch
    Elliott Frisch over 10 years
    Are you trying to use anonymous transaction handles? Store them in variables. Can you edit your question to indicate latest code?
  • Elliott Frisch
    Elliott Frisch over 10 years
    Aren't you timing the file read as well as the insert(s)? Try to isolate, because it appears like your reading one row at a time. Also, please add your MyRow source.
  • Yellow Duck
    Yellow Duck over 10 years
    I did see the file reading part. It's local, so the speed from reading the file is very fast. And I have included the MyRow source. Many thanks.
  • Elliott Frisch
    Elliott Frisch over 10 years
    You're using \@Id \@GeneratedValue in MyRow You missed this NOTE in my link Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.
  • Yellow Duck
    Yellow Duck over 10 years
    That's right, and I followed its instruction to add '<property name="hibernate.cache.use_second_level_cache" value="false" />', but it didn't seem to make any difference. Or, I misunderstood it?
  • Elliott Frisch
    Elliott Frisch over 10 years
    I believe you misunderstood. You'll have to use another identity manager to get batching.
  • Yellow Duck
    Yellow Duck over 10 years
    I changed the identity manager from identifier generator to manual. And it works much faster. By the way, any suggestion for a good identifier generator that works with batch insert? Please also update your answer so people don't have to read through the comments to understand what's going on.
  • mjs
    mjs over 3 years
    This worked. Flush and then clear gives solid and linear performance. Not sure what the consequence is however. bathc_size 50 has no effect as far as I can see.
  • catch22
    catch22 over 3 years
    if @GeneratedValue(strategy = GenerationType.IDENTITY) is used, this config won't work