Error about invalid XML characters on Java
Solution 1
fixed with this code:
String cleanXMLString = null;
Pattern pattern = null;
Matcher matcher = null;
pattern = Pattern.compile("[\\000]*");
matcher = pattern.matcher(dirtyXMLString);
if (matcher.find()) {
cleanXMLString = matcher.replaceAll("");
}
Solution 2
Unicode character 0x0
represents NULL
meaning that the data you're pulling contains a NULL somewhere (which is not allowed in XML and hence your error).
Make sure that you find out what causes the NULL in the first place.
Also, how are you interacting with the WebService? If you're using Axis, make sure that the WSDL has some encoding specified for data in and out.
Solution 3
This is an encoding issue. Either you read it the inputstream as UTF8 and it isn't or the other way around.
You should specify the encoding explicitly when you read the content. E.g. via
new InputStreamReader(getInputStream(), "UTF-8")
Another problem could be the tomcat. Try to add URIEncoding="UTF-8" in your tomcat’s connector settings in the server.xml file. Because:
It turned out that the JSP specification says that if the page encoding of the JSP pages is not explicitely declared, then ISO-8859-1 should be used (!).
Taken from here.
Comments
-
JuanDeLosMuertos over 4 years
Parsing an xml file on Java I get the error:
An invalid XML character (Unicode: 0x0) was found in the element content of the document.
The xml comes from a webservice.
The problem is that I get the error only when the webservice is running on localhost (windows+tomcat), but not when the webservice is online (linux+tomcat).
How can I replace the invalid char?? Thanks.
-
Tomalak over 14 years+1 for common sense approach. Blindly fixing such an error without caring where it came from is not a good idea.
-
sp00m over 10 years+1, but can be simplified by
dirtyXMLString.replaceAll("[\\000]*", "")
though. -
titogeo over 10 yearsCharacter like this fileformat.info/info/unicode/char/e4f8/index.htm fails while saving to mysql. Is there a generic way to find or ignore these in java. Adding "UTF-8" is not helping.
-
Xavi López over 9 years-1 Those links seem to be dead now. This is why link-only answers are discouraged.
-
Whitecat almost 9 yearsIt can be sped up also by changing the
*
to a+
:dirtyXMLString.replaceAll("[\\000]+", "")