Processing large xlsx file

97,910

Solution 1

Try using the event API. See Event API (HSSF only) and XSSF and SAX (Event API) in the POI documentation for details. A couple of quotes from that page:

HSSF:

The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.

XSSF:

If memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself. This is intended for intermediate developers who are willing to learn a little bit of low level structure of .xlsx files, and who are happy processing XML in java. Its relatively simple to use, but requires a basic understanding of the file structure. The advantage provided is that you can read a XLSX file with a relatively small memory footprint.

For output, one possible approach is described in the blog post Streaming xlsx files. (Basically, use XSSF to generate a container XML file, then stream the actual content as plain text into the appropriate xml part of the xlsx zip archive.)

Solution 2

A dramatic improvement in memory usage can be done by using a File instead of a Stream. (It is better to use a streaming API, but the Streaming API's have limitations, see http://poi.apache.org/spreadsheet/index.html)

So instead of

Workbook workbook = WorkbookFactory.create(inputStream);

do

Workbook workbook = WorkbookFactory.create(new File("yourfile.xlsx"));

This is according to : http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream

Files vs InputStreams

"When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file."

Solution 3

I was having the same problem with a lot less of row, but large strings.

Since I don't have to keep my data loaded, I found out that I can use SXSSF instead of XSSF.

They have similar interfaces, which helps if you have a lot of code already writen. But with SXSSF it is possible to set the amount of rows you keep loaded.

Here is the link. http://poi.apache.org/spreadsheet/how-to.html#sxssf

Solution 4

If you want to auto-fit or set styles or write all rows in large (30k+ rows) xlsx file,use SXSSFWorkbook.Here is the sample code that helps you...

SXSSFWorkbook wb = new SXSSFWorkbook();
            SXSSFSheet sheet = (SXSSFSheet) wb.createSheet("writetoexcel");
            Font font = wb.createFont();
                font.setBoldweight((short) 700);
                // Create Styles for sheet.
                XSSFCellStyle Style = (XSSFCellStyle) wb.createCellStyle();
                Style.setFillForegroundColor(new XSSFColor(java.awt.Color.LIGHT_GRAY));
                Style.setFillPattern(XSSFCellStyle.SOLID_FOREGROUND);
                Style.setFont(font);
                //iterating r number of rows
            for (int r=0;r < 30000; r++ )
            {
                Row row = sheet.createRow(r);
                //iterating c number of columns
                for (int c=0;c < 75; c++ )
                {
                    Cell cell = row.createCell(c);
                    cell.setCellValue("Hello"); 
                    cell.setCellStyle(Style);
                }
    }
            FileOutputStream fileOut = new FileOutputStream("E:" + File.separator + "NewTest.xlsx");

Solution 5

I used Event API for a HSSF file (.xls), and I discovered terrible lack of documentation about order of records.

Share:
97,910
miah
Author by

miah

Updated on September 04, 2020

Comments

  • miah
    miah almost 4 years

    I need to auto-fit all rows in large (30k+ rows) xlsx file.

    The following code via apache poi works on small files, but goes out with OutOfMemoryError on large ones:

    Workbook workbook = WorkbookFactory.create(inputStream);
    Sheet sheet = workbook.getSheetAt(0);
    
    for (Row row : sheet) {
        row.setHeight((short) -1);
    }
    
    workbook.write(outputStream);
    

    Update: Unfortunately, increasing heap size is not an option - OutOfMemoryError appears at -Xmx1024m and 30k rows is not an upper limit.

  • ashishjmeshram
    ashishjmeshram over 12 years
    Hi am also having the same problem of reading large excel files. Getting out of memory issues. I have seen the poi.apache.org/spreadsheet/how-to.html#xssf_sax_api and it does not specify how to read the excel files. Please help.
  • markusk
    markusk over 12 years
    @Ashish: Please post your request as a separate question on Stack Overflow with more details. That way, other users can help you as well.
  • David Peleg
    David Peleg over 9 years
    For reading large Excel files you can take a look on this tiny and simple library: github.com/davidpelfree/sjxlsx
  • cripox
    cripox over 8 years
    I know this is old: but did you found anything about the order of the events in HSSF and/or XSSF?
  • kiltek
    kiltek about 8 years
    This gives me an error stating: Caught: java.lang.LinkageError: loader constraint violation: when resolving interface method "org.xml.sax.XMLReader.setEntityResolver(Lorg/xml/sax/Entity‌​Resolver;)V" the class loader (instance of org/ codehaus/groovy/tools/RootLoader) of the current class, org/dom4j/io/SAXReader, and the class loader (instance of <bootloader>) for the method's defining class, org/xml/sax/XMLReader, have different C lass objects for the type org/xml/sax/EntityResolver used in the signature I am using poi-3.9
  • Mandrek
    Mandrek over 6 years
    @rjdkolb can you see my post stackoverflow.com/questions/48772021/…
  • saran3h
    saran3h over 3 years
    Nothing improves when using file upwards of 15mb. I've set -Xmx2048m and yet it throws out of memory errors.