Easiest way to compare two Excel files in Java?

55,332

Solution 1

Here's what I ended up doing (with the heavy lifting being done by DBUnit):

/**
 * Compares the data in the two Excel files represented by the given input
 * streams, closing them on completion
 * 
 * @param expected can't be <code>null</code>
 * @param actual can't be <code>null</code>
 * @throws Exception
 */
private void compareExcelFiles(InputStream expected, InputStream actual)
  throws Exception
{
  try {
    Assertion.assertEquals(new XlsDataSet(expected), new XlsDataSet(actual));
  }
  finally {
    IOUtils.closeQuietly(expected);
    IOUtils.closeQuietly(actual);
  }
}

This compares the data in the two files, with no risk of false negatives from any irrelevant metadata that might be different. Hope this helps someone.

Solution 2

You might consider using my project simple-excel which provides a bunch of Hamcrest Matchers to do the job.

When you do something like the following,

assertThat(actual, WorkbookMatcher.sameWorkbook(expected));

You'd see, for example,

java.lang.AssertionError:
Expected: entire workbook to be equal
     but: cell at "C14" contained <"bananas"> expected <nothing>,
          cell at "C15" contained <"1,850,000 EUR"> expected <"1,850,000.00 EUR">,
          cell at "D16" contained <nothing> expected <"Tue Sep 04 06:30:00">
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)

That way, you can run it from your automatted tests and get meaningful feedback whilst you're developing.

You can read more about it at this article on my site

Solution 3

A simple file comparison can easily be done using some checksumming (like MD5) or just reading both files.

However, as Excel files contain loads of metadata, the files will probably never be identical byte-for-byte, as James Burgess pointed out. So you'll need another kind of comparison for your test.

I'd recommend somehow generating a "canonical" form from the Excel file, i.e. reading the generated Excel file and converting it to a simpler format (CSV or something similar), which will only retain the information you want to check. Then you can use the "canonical form" to compare with your expected result (also in canonical form, of course).

Apache POI might be useful for reading the file.

BTW: Reading a whole file to check its correctnes would generally not be considere a Unit test. That's an integration test...

Solution 4

I needed to do something similar and was already using the Apache POI library in my project to create Excel files. So I opted to use the included ExcelExtractor interface to export both workbooks as a string of text and asserted that the strings were equal. There are implementations for both HSSF for .xls as well as XSSF for .xlsx.

Dump to string:

XSSFWorkbook xssfWorkbookA = ...;
String workbookA = new XSSFExcelExtractor(xssfWorkbookA).getText();

ExcelExtractor has some options for what all should be included in the string dump. I found it to have useful defaults of including sheet names. In addition it includes the text contents of the cells.

Solution 5

The easiest way I find is to use Tika. I use it like this:

private void compareXlsx(File expected, File result) throws IOException, TikaException {
     Tika tika = new Tika();
     String expectedText = tika.parseToString(expected);
     String resultText = tika.parseToString(result);
     assertEquals(expectedText, resultText);
}


<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.13</version>
    <scope>test</scope>
</dependency>
Share:
55,332
juckele
Author by

juckele

Senior developer on the JIRA team with Atlassian.

Updated on July 19, 2022

Comments

  • juckele
    juckele almost 2 years

    I'm writing a JUnit test for some code that produces an Excel file (which is binary). I have another Excel file that contains my expected output. What's the easiest way to compare the actual file to the expected file?

    Sure I could write the code myself, but I was wondering if there's an existing method in a trusted third-party library (e.g. Spring or Apache Commons) that already does this.