OutOfMemory when reading big amounts of data using hibernate

13,209

Solution 1

It looks like you are calling getProductIterator() with the starting and ending row numbers, while getProductIterator() is expecting the starting row and a row count. As your "upper limit" gets higher you are reading data in bigger chunks. I think you mean to pass batchSize as the second argument to getProductIterator().

Solution 2

Not a direct answer but for this kind of data manipulation, I would use the StatelessSession interface.

Solution 3

KeithL is right - you're passing an ever-increasing limit. But breaking it up that way doesn't make sense anyway. The whole point of a scroll cursor is that you process a row at a time so there's no need to break it up into chunks. The fetch size reduces the trips to the database at the cost of using up more memory. The general pattern should be:

Query q = session.createCriteria(... no offset or limit ...);
q.setCacheMode(CacheMode.IGNORE); // prevent query or second level caching
q.setFetchSize(1000);  // experiment with this to optimize performance vs. memory
ScrollableResults iterator = query.scroll(ScrollMode.FORWARD_ONLY);
while (iterator.next()) {
  Product p = (Product)iterator.get();
  ...
  session.evict(p);  // required to keep objects from accumulating in the session
}

That said, the error is getHtmlSources so the problem may be completely unrelated to the session/cursor/scroll issue. If those html strings are huge and they're being referenced the entire time, you may just be running out of contiguous memory.

Btw, I don't see a getScrollableResults method on ScrollableResults.

Solution 4

At the risk of appearing stupid - have you considered doing this another way?

Personally I would avoid doing batch processing that "far away" from the database. I don't know what database you're using but there's usually a mechanism for efficiently pulling a dataset out of the database & into a file even if it involves moderately simple manipulation on the way out. Stored procedures, specific export utilities. Investigate what else is available from your database vendor.

Share:
13,209
Vladimir
Author by

Vladimir

Updated on June 18, 2022

Comments

  • Vladimir
    Vladimir almost 2 years

    I need to export big amount of data from database. Here is classes that represents my data:

    public class Product{
    ...
    
        @OneToMany
        @JoinColumn(name = "product_id")
        @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
        List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();
    

    ... }

    ProductHtmlSource - contains big string inside which I actually need to export.

    Since size of exported data is bigger than JVM memory I'm reading my data by chunks. Like this:

    final int batchSize = 1000;      
    for (int i = 0; i < 50; i++) {
      ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
      while (iterator.getScrollableResults().next()) {
         Product product = (Product) iterator.getScrollableResults().get(0); 
         List<String> htmls = product.getHtmlSources();
         <some processing>
      }
    

    }

    Code of getProductIterator :

    public ScrollableResults getProductIterator(int offset, int limit) {
            Session session = getSession(true);
            session.setCacheMode(CacheMode.IGNORE);
            ScrollableResults iterator = session
                    .createCriteria(Product.class)
                    .add(Restrictions.eq("status", Product.Status.DONE))
                    .setFirstResult(offset)
                    .setMaxResults(limit)
                    .scroll(ScrollMode.FORWARD_ONLY);
            session.flush();
            session.clear();
    
            return iterator;
        }
    

    The problem is that in spite of I clearing session after reading of each data chunk Product objects accumulates somewhere and I'm get OutOfMemory exception. The problem is not in processing block of code even without it I get memory error. The size of batch also is not a problem since 1000 objects easily sit into memory.

    Profiler showed that objects accumulates in org.hibernate.engine.StatefulPersistenceContext class.

    The stacktrace:

    Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
        at java.lang.StringBuffer.append(StringBuffer.java:307)
        at org.hibernate.type.TextType.get(TextType.java:41)
        at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
        at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
        at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
        at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
        at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
        at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
        at org.hibernate.loader.Loader.getRow(Loader.java:1206)
        at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
        at org.hibernate.loader.Loader.doQuery(Loader.java:701)
        at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
        at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
        at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
        at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
        at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
        at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
        at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
        at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
        at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
        at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
        **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
        at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
        at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**
    
  • Gab
    Gab about 7 years
    "session.evict(p); // an alternative to setting the cache mode above" This statement is just false, the cache mode concern L2 and query cache not the session itself. An evict or clear statement is still mandatory.
  • espidesigns
    espidesigns about 7 years
    Gab is correct. I updated the answer to reflect that.