Java Apache-poi, memory leak with excel files

11,370

Solution 1

Typically, POI has the whole workbook in memory. So, a large workbook requires a different approach.

While writing, one can use SXSSF and most calls are the same, except that only a certain number of rows are in memory.

In your case, you are reading. For this you can use their "event driven" API. The basic idea here is that you do not get the workbook as one huge object. Instead, you get it piecemeal, as it is read, and you can save off as much as you wish into your own data-structure. Or, you can simply process it as you read it and not save very much.

Since this is a lower-level API (driven by the structure of the data being read), there is one approach for XLS and a different approach for XLSX. Look at the POI "How To" page, and find the section titled "XSSF and SAX (Event API)".

That example demonstrates how to detect the value of each cell as it is read in. (You'll need the xercesImpl.jar on your library path.)

Solution 2

In the case of an exception in your first try block, you return, so you wouldn't close the workbook.

Put the close in a finally block.

Workbook workbook = null;
try {
  workbook = new XSSFWorkbook(file); //line 18

  // later would be here the code to analyze the workbook
} catch (Exception e1) {
  e1.printStackTrace(); return;
}  finally {
  if (workbook != null) workbook.close();
}

Or, better, use try-with-resources.

try (XSSFWorkbook workbook = new XSSFWorkbook(file) {
  // later would be here the code to analyze
} catch (Exception e1) {
  e1.printStackTrace();
}
// No need for explicit close.
Share:
11,370
MichaD
Author by

MichaD

Updated on June 12, 2022

Comments

  • MichaD
    MichaD about 2 years

    I need to read (15000) excel files for my thesis. I'm using apache poi to open and later to analyze them but after around 5000 files I'm getting the following exception and stacktrace:

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3044)
    at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3065)
    at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3263)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1822)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4682)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:92)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:173)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:165)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:417)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:382)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:178)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:249)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:302)
    at de.spreadsheet_realtions.analysis.WorkbookAnalysis.analyze(WorkbookAnalysis.java:18)
    

    Code (at the moment just open the file and close the file):

    public static void main(String[] args) {
        start();
    }
    
    public void start(){
        File[] files = getAllFiles(Config.folder);
        ZipSecureFile.setMinInflateRatio(0.00);
        for(File f: files){
            analyze(f);
        }
    }
    
    public void analyze(File file){
        Workbook  workbook = null;
        try {
            workbook = new XSSFWorkbook(file); //line 18
        } catch (Exception e1) {e1.printStackTrace(); return;}
    //      later would be here the code to analyze the workbook
        try {
            workbook.close();
        } catch (Exception e) {e.printStackTrace();}
    }
    

    I tried also with OPCPackage.open(file) and I got the same result.

    What I'm doing wrong or what can I do to solve this problem? Thanks for any help.


    EDIT: The same for the code below.

    try (XSSFWorkbook workbook = new XSSFWorkbook(file)){
    } catch (Exception e1) {e1.printStackTrace(); return;}
    
  • MichaD
    MichaD about 8 years
    Thanks for the hint. I tried it but I get the same exception and stacktrace after the same number of files.
  • Andy Turner
    Andy Turner about 8 years
    Well, in that case it's not an issue with the code you posted :) You are probably holding on to references to stuff inside the code you are using to analyze the workbook - OOM failures don't necessarily manifest in the place where the actual memory leak is occurring.
  • MichaD
    MichaD about 8 years
    That is the point which I don't understand because I only create a new xssfworkbook and close it. I'm doing nothing with the workbook at the moment. I added the complete code which I executing.