Fastest way to read a large XML file in Java

12,920

I think that your code shown in other question is faster than DOM-like parsers which would definitely require more memory and likely some computation in order to reconstruct the document in full. You may want to profile the code though.

I also think that your code can be prettified a bit for streaming processing if you would use javax XMLStreamReader, which I found quite helpful for many tasks. That class is "... is designed to be the lowest level and most efficient way to read XML data", according to Oracle.

Here is the excerpt from my code where I parse StackOverflow users XML file distributed as a public data dump:

// the input file location
private static final String fileLocation = "/media/My Book/Stack/users.xml";

// the target elements
private static final String USERS_ELEMENT = "users";
private static final String ROW_ELEMENT = "row";

// get the XML file handler
//
FileInputStream fileInputStream = new FileInputStream(fileLocation);
XMLStreamReader xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(
    fileInputStream);

// reading the data
//
while (xmlStreamReader.hasNext()) {

  int eventCode = xmlStreamReader.next();

  // this triggers _users records_ logic
  //
  if ((XMLStreamConstants.START_ELEMENT == eventCode)
      && xmlStreamReader.getLocalName().equalsIgnoreCase(USERS_ELEMENT)) {

    // read and parse the user data rows
    //
    while (xmlStreamReader.hasNext()) {

      eventCode = xmlStreamReader.next();

      // this breaks _users record_ reading logic
      //
      if ((XMLStreamConstants.END_ELEMENT == eventCode)
          && xmlStreamReader.getLocalName().equalsIgnoreCase(USERS_ELEMENT)) {
        break;
      }
      else {

        if ((XMLStreamConstants.START_ELEMENT == eventCode)
            && xmlStreamReader.getLocalName().equalsIgnoreCase(ROW_ELEMENT)) {

          // extract the user data
          //
          User user = new User();
          int attributesCount = xmlStreamReader.getAttributeCount();
          for (int i = 0; i < attributesCount; i++) {
            user.setAttribute(xmlStreamReader.getAttributeLocalName(i),
                xmlStreamReader.getAttributeValue(i));
          }
          // all other user record-related logic
          //

        }
      }
    }
  }
}

That users file format is quite simple and similar to your Bank.xml file:

<users>
  <row Id="1567200" Reputation="1" CreationDate="2012-07-31T23:57:57.770" DisplayName="XXX" EmailHash="XXX" LastAccessDate="2012-08-01T00:55:12.953" Views="0" UpVotes="0" DownVotes="0" />
  ...
</users>
Share:
12,920
bigdata123
Author by

bigdata123

Updated on June 04, 2022

Comments

  • bigdata123
    bigdata123 almost 2 years

    I'm working on a java project to optimize existing code. Currently i'm using BufferedReader/FileInputStream to read content of an XML file as String in Java.

    But my question is , is there any faster way to read XML content.Are SAX/DOM faster than BufferedReader/FileInputStream?

    Need help regarding the above issue.

    Thanks in advance.