How can I clone an entire Document using the Java DOM?

java xml dom

10,114

Solution 1

As some of the comments point out, there are problems with serializing and re-parsing a document. In addition to memory usage, performance considerations, and normalization, there's also loss of the prolog (DTD or schema), potential loss of comments (which aren't required to be captured), and loss of what may be significant whitespace. Serialization should be avoided.

If the real goal is to make a copy of an existing DOM Document object, then it should be handled programmatically, in memory. Thanksfully there is a relatively easy way to do this using features available in Java 5 or using external XSLT libraries like Xalan, which is to a a pass-through transformation.

Below is shown the Java 5 solution:

TransformerFactory tfactory = TransformerFactory.newInstance();
Transformer tx   = tfactory.newTransformer();
DOMSource source = new DOMSource(doc);
DOMResult result = new DOMResult();
tx.transform(source,result);
return (Document)result.getNode();

That's basically it. You'll need to handle exceptions and may wish to configure the transformer, but I leave that as an exercise for the reader.

Solution 2

Still, how about the quick'n'dirty way: serialize the whole Document into XML string and then parse it back using DOM Parser?

I don't see a reason why the serialized version would lack anything. Mind to provide an example?

Memory consumption would be significant, but, on the other hand, if you're duplicating the whole DOM, it cannot be small anyway...

10,114

Author by

Adam Crume

Updated on June 13, 2022

Comments

Adam Crume almost 2 years

I'm looking for a reliable, implementation-independent way to clone an entire Document. The Javadocs specifically say that calling cloneNode on a Document is implementation-specific. I've tried passing the Document through a no-op Transformer, but the resulting Node has no owner Document.

I could create a new Document and import the nodes from the old one, but I'm afraid there might be bits of Document metadata that get lost. Same thing with writing the Document to a string and parsing it back in.

Any ideas?

By the way, I'm stuck at Java 1.4.2, for reasons beyond my control.
Adam Crume over 15 years

This is a contrived example, but serializing the document will normalize it. (Adjacent text nodes will be combined.) Memory consumption in this particular case is not an issue.
Adam Crume over 15 years

Actually, a much better example would be a document with an inline DTD. If you use a no-op transformer to serialize the DOM, the DTD gets left out. Of course, if there's a better way to serialize the DOM, I'm all ears.
ka3ak about 12 years

I have tried to copy the document in this way. But isEqualNode() returns false if I compare both documents.