Java, XML DocumentBuilder - setting the encoding when parsing
Solution 1
Here's an updated answer since OutputFormat is deprecated :
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String output = writer.getBuffer().toString().replaceAll("\n|\r", "");
The second part will return the XML Document as String
Solution 2
// Read XML
String xml = "xml"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
// Append formatting
OutputFormat format = new OutputFormat(document);
if (document.getXmlEncoding() != null) {
format.setEncoding(document.getXmlEncoding());
}
format.setLineWidth(100);
format.setIndenting(true);
format.setIndent(5);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
String result = out.toString();
Solution 3
I solved it, given alot of trial and errors.
I was using
OutputFormat format = new OutputFormat(document);
but changed it to
OutputFormat format = new OutputFormat(d, encoding, true);
and this solved my problem.
encoding
is what I set it to be
true
refers to whether or not indent is set.
Note to self - read more carefully - I had looked at the javadoc hours ago - if only I'd have read more carefully.
Admin
Updated on April 06, 2020Comments
-
Admin over 3 years
I'm trying to save a tree (extends
JTree
) which holds anXML
document to aDOM Object
having changed it's structure.I have created a new document object, traversed the tree to retrieve the contents successfully (including the original encoding of the
XML
document), and now have aByteArrayInputStream
which has the tree contents (XML
document) with the correct encoding.The problem is when I parse the
ByteArrayInputStream
the encoding is changed toUTF-8
(in theXML
document) automatically.Is there a way to prevent this and use the correct encoding as provided in the
ByteArrayInputStream
.It's also worth adding that I have already used the
transformer.setOutputProperty(OutputKeys.ENCODING, encoding)
method to retrieve the right encoding.Any help would be appreciated.