Error While Reading Large Excel Files (xlsx) Via Apache POI

26,311

Solution 1

You don't mention whether you need to modify the spreadsheet or not.

This may be obvious, but if you don't need to modify the spreadsheet, then you don't need to parse it and write it back out, you can simply read bytes from the file, and write out bytes, as you would with, say an image, or any other binary format.

If you do need to modify the spreadsheet before sending it to the user, then to my knowledge, you may have to take a different approach.

Every library that I'm aware of for reading Excel files in Java reads the whole spreadsheet into memory, so you'd have to have 50MB of memory available for every spreadsheet that could possibly be concurrently processed. This involves, as others have pointed out, adjusting the heap available to the VM.

If you need to process a large number of spreadsheets concurrently, and can't allocate enough memory, consider using a format that can be streamed, instead of read all at once into memory. CSV format can be opened by Excel, and I've had good results in the past by setting the content-type to application/vnd.ms-excel, setting the attachment filename to something ending in ".xls", but actually returning CSV content. I haven't tried this in a couple of years, so YMMV.

Solution 2

Here is an example to read a large xls file using sax parser.

public void parseExcel(File file) throws IOException {

        OPCPackage container;
        try {
            container = OPCPackage.open(file.getAbsolutePath());
            ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container);
            XSSFReader xssfReader = new XSSFReader(container);
            StylesTable styles = xssfReader.getStylesTable();
            XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
            while (iter.hasNext()) {
                InputStream stream = iter.next();

                processSheet(styles, strings, stream);
                stream.close();
            }
        } catch (InvalidFormatException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (OpenXML4JException e) {
            e.printStackTrace();
        }

}

protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException {

        InputSource sheetSource = new InputSource(sheetInputStream);
        SAXParserFactory saxFactory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = saxFactory.newSAXParser();
            XMLReader sheetParser = saxParser.getXMLReader();
            ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {

            @Override
                public void startRow(int rowNum) {
                }
                @Override
                public void endRow() {
                }
                @Override
                public void cell(String cellReference, String formattedValue) {
                }
                @Override
                public void headerFooter(String text, boolean isHeader, String tagName) {

                }

            }, 
            false//means result instead of formula
            );
            sheetParser.setContentHandler(handler);
            sheetParser.parse(sheetSource);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}

Solution 3

In the bellwo example I'll add a complete code how to parse a complete excel file (for me 60Mo) into list of object without any problem of "out of memory" and work fine:

import java.util.ArrayList;
import java.util.List;


class DistinctByProperty {

    private static OPCPackage xlsxPackage = null;
    private static PrintStream output= System.out;
    private static List<MassUpdateMonitoringRow> resultMapping = new ArrayList<>();


    public static void main(String[] args) throws IOException {

        File file = new File("C:\\Users\\aberguig032018\\Downloads\\your_excel.xlsx");

        double bytes = file.length();
        double kilobytes = (bytes / 1024);
        double megabytes = (kilobytes / 1024);
        System.out.println("Size "+megabytes);

        parseExcel(file);
    }

    public static void parseExcel(File file) throws IOException {

        try {
            xlsxPackage = OPCPackage.open(file.getAbsolutePath(), PackageAccess.READ);
            ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(xlsxPackage);
            XSSFReader xssfReader = new XSSFReader(xlsxPackage);
            StylesTable styles = xssfReader.getStylesTable();
            XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
            int index = 0;
            while (iter.hasNext()) {
                try (InputStream stream = iter.next()) {
                    String sheetName = iter.getSheetName();
                    output.println();
                    output.println(sheetName + " [index=" + index + "]:");
                    processSheet(styles, strings, new MappingFromXml(resultMapping), stream);
                }
                ++index;
            }

        } catch (InvalidFormatException e) {
            e.printStackTrace();
        } catch (OpenXML4JException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        }
    }

    private static void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, MappingFromXml mappingFromXml, InputStream sheetInputStream) throws IOException, SAXException {
        DataFormatter formatter = new DataFormatter();
        InputSource sheetSource = new InputSource(sheetInputStream);
        try {
            XMLReader sheetParser = SAXHelper.newXMLReader();
            ContentHandler handler = new XSSFSheetXMLHandler(
                    styles, null, strings, mappingFromXml, formatter, false);

            sheetParser.setContentHandler(handler);
            sheetParser.parse(sheetSource);
            System.out.println("Size of Array "+resultMapping.size());
        } catch(ParserConfigurationException e) {
            throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
        }
    }
}

you have to add a calss that implements

SheetContentsHandler

import com.sun.org.apache.xpath.internal.operations.Bool;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;

import org.apache.poi.xssf.usermodel.XSSFComment;

import java.io.PrintStream;
import java.util.ArrayList;
import java.util.List;

public class MappingFromXml implements SheetContentsHandler {

    private List<myObject> result = new ArrayList<>();
    private myObject myObject = null;
    private int lineNumber = 0;
    /**
     * Number of columns to read starting with leftmost
     */
    private int minColumns = 25;
    /**
     * Destination for data
     */
    private PrintStream output = System.out;

    public MappingFromXml(List<myObject> list) {
        this.result = list;
    }

    @Override
    public void startRow(int i) {
        output.println("iii " + i);
        lineNumber = i;
        myObject = new myObject();
    }

    @Override
    public void endRow(int i) {
        output.println("jjj " + i);
        result.add(myObject);
        myObject = null;
    }

    @Override
    public void cell(String cellReference, String formattedValue, XSSFComment comment) {
        int columnIndex = (new CellReference(cellReference)).getCol();

        if(lineNumber > 0){
            switch (columnIndex) {
                case 0: {//Tech id
                    if (formattedValue != null && !formattedValue.isEmpty())
                        myObject.setId(Integer.parseInt(formattedValue));
                }
                break;
                //TODO add other cell
            }
        }
    }

    @Override
    public void headerFooter(String s, boolean b, String s1) {

    }
}

For more information visite this link

Share:
26,311
jamesT
Author by

jamesT

Updated on August 14, 2020

Comments

  • jamesT
    jamesT almost 4 years

    I am trying to read large excel files xlsx via Apache POI, say 40-50 MB. I am getting out of memory exception. The current heap memory is 3GB.

    I can read smaller excel files without any issues. I need a way to read large excel files and then them back as response via Spring excel view.

    public class FetchExcel extends AbstractView {
    
    
        @Override
        protected void renderMergedOutputModel(
                Map model, HttpServletRequest request, HttpServletResponse response) 
        throws Exception {
    
        String fileName = "SomeExcel.xlsx";
    
        response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
    
        OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx");
    
        XSSFWorkbook workbook = new XSSFWorkbook(pkg);
    
        ServletOutputStream respOut = response.getOutputStream();
    
        pkg.close();
        workbook.write(respOut);
        respOut.flush();
    
        workbook = null;                    
    
        response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\"");
    
    
        }    
    
    }
    

    I first started off using XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in); but that was costly per Apache POI API, so I switched to OPC package way but still the same effect. I don't need to parse or process the file, just read it and return it.

  • Nikhil Das Nomula
    Nikhil Das Nomula over 10 years
    This example shows how to write to an excel file, the question is about how do we write to an excel file in poi.
  • Anand
    Anand almost 10 years
    Thanks O.C exactly what I was looking for processing over 250k rows. Perfectly works.
  • 99Sono
    99Sono over 8 years
    Many thanks for the code snippet up there. Apache POI should post in their documentation an example as the one above to advertise those APIs more readily.
  • user1799214
    user1799214 over 8 years
    @O.C Thanks a ton!! Could you please tell how to consider blank cells in excel using the above code?
  • Christoph
    Christoph over 8 years
    Is there a way using an iterator-based / row-based approach? I would like to wrap an iterator around it with hasNext() and next() methods so that the caller has more influence. In this event-based approach I have no control over the progress, because I have to fetch all events until no events are there.
  • sharif2008
    sharif2008 about 8 years
    but this is an xlsx parser not xls parser :(
  • Yan Khonski
    Yan Khonski almost 7 years
    This is where I copy-pasted from!
  • abr
    abr almost 6 years
    Where is the content of the xlsx?
  • abr
    abr almost 6 years
    @Anand still around?