How do I load an org.w3c.dom.Document from XML in a string?

135,030

Solution 1

This works for me in Java 1.5 - I stripped out specific exceptions for readability.

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import java.io.ByteArrayInputStream;

public Document loadXMLFromString(String xml) throws Exception
{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setNamespaceAware(true);
    DocumentBuilder builder = factory.newDocumentBuilder();

    return builder.parse(new ByteArrayInputStream(xml.getBytes()));
}

Solution 2

Whoa there!

There's a potentially serious problem with this code, because it ignores the character encoding specified in the String (which is UTF-8 by default). When you call String.getBytes() the platform default encoding is used to encode Unicode characters to bytes. So, the parser may think it's getting UTF-8 data when in fact it's getting EBCDIC or something… not pretty!

Instead, use the parse method that takes an InputSource, which can be constructed with a Reader, like this:

import java.io.StringReader;
import org.xml.sax.InputSource;
…
        return builder.parse(new InputSource(new StringReader(xml)));

It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k.

Solution 3

Just had a similar problem, except i needed a NodeList and not a Document, here's what I came up with. It's mostly the same solution as before, augmented to get the root element down as a NodeList and using erickson's suggestion of using an InputSource instead for character encoding issues.

private String DOC_ROOT="root";
String xml=getXmlString();
Document xmlDoc=loadXMLFrom(xml);
Element template=xmlDoc.getDocumentElement();
NodeList nodes=xmlDoc.getElementsByTagName(DOC_ROOT);

public static Document loadXMLFrom(String xml) throws Exception {
        InputSource is= new InputSource(new StringReader(xml));
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = null;
        builder = factory.newDocumentBuilder();
        Document doc = builder.parse(is);
        return doc;
    }

Solution 4

To manipulate XML in Java, I always tend to use the Transformer API:

import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.stream.StreamSource;

public static Document loadXMLFrom(String xml) throws TransformerException {
    Source source = new StreamSource(new StringReader(xml));
    DOMResult result = new DOMResult();
    TransformerFactory.newInstance().newTransformer().transform(source , result);
    return (Document) result.getNode();
}   
Share:
135,030

Related videos on Youtube

Frank Krueger
Author by

Frank Krueger

I am an engineer living in Seattle. I have been programming for about 15 years. I started out with video game hacking with the Code Alliance. Moved on to embedded systems development in an R&D group at GM. Did way too much graphics (3D) programming. Then did a lot of network programming for large data centers. Was forced to get my Master's in Electrical Engineering. Got into compiler and interpreter development. Spent some time coding at Microsoft. Moved on a year later to start my own company creating control systems and web apps. I love programming and have spent way too much time learning too many languages, frameworks, APIs, paradigms, and operating systems. Super Secret Code: pL95Tr3

Updated on July 08, 2022

Comments

  • Frank Krueger
    Frank Krueger almost 2 years

    I have a complete XML document in a string and would like a Document object. Google turns up all sorts of garbage. What is the simplest solution? (In Java 1.5)

    Solution Thanks to Matt McMinn, I have settled on this implementation. It has the right level of input flexibility and exception granularity for me. (It's good to know if the error came from malformed XML - SAXException - or just bad IO - IOException.)

    public static org.w3c.dom.Document loadXMLFrom(String xml)
        throws org.xml.sax.SAXException, java.io.IOException {
        return loadXMLFrom(new java.io.ByteArrayInputStream(xml.getBytes()));
    }
    
    public static org.w3c.dom.Document loadXMLFrom(java.io.InputStream is) 
        throws org.xml.sax.SAXException, java.io.IOException {
        javax.xml.parsers.DocumentBuilderFactory factory =
            javax.xml.parsers.DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        javax.xml.parsers.DocumentBuilder builder = null;
        try {
            builder = factory.newDocumentBuilder();
        }
        catch (javax.xml.parsers.ParserConfigurationException ex) {
        }  
        org.w3c.dom.Document doc = builder.parse(is);
        is.close();
        return doc;
    }
    
    • Kenneth Xu
      Kenneth Xu almost 11 years
      It would be nice if you can correct the solution. Using String.getByptes and InputStream impose i18n problems. One of my friend got the code from here as is which is wrong. Lucky that findbugs detected the issue. The correct solution provided by erickson is to use InputSource.
  • McDowell
    McDowell over 14 years
    As noted in sylvarking's answer, this code uses getBytes() with no consideration for encoding.
  • pat8719
    pat8719 over 12 years
    So simple but so elusive a solution on Google. Thank you +1
  • rogerdpack
    rogerdpack over 11 years
    do you mean erickson's answer? or maybe he renamed his profile?
  • InfantPro'Aravind'
    InfantPro'Aravind' over 11 years
    shouldn't there be casting return (Document) builder.parse(new ByteArrayInputStream(xml.getBytes()));??
  • Vitaly Sazanovich
    Vitaly Sazanovich over 10 years
    I realize now that I shouldn't just copy-and-paste the accepted answer but rather read through.
  • kosta5
    kosta5 about 7 years
    Awesome! Saved our lives on JDK8 with following setup file.encoding=ISO-8859_1 , javax.servlet.request.encoding=UTF-8 PS the answer labeled as correct didnt work for us