Error about invalid XML characters on Java

68,744

Solution 1

fixed with this code:

String cleanXMLString = null;
Pattern pattern = null;
Matcher matcher = null;
pattern = Pattern.compile("[\\000]*");
matcher = pattern.matcher(dirtyXMLString);
if (matcher.find()) {
   cleanXMLString = matcher.replaceAll("");
}

Solution 2

Unicode character 0x0 represents NULL meaning that the data you're pulling contains a NULL somewhere (which is not allowed in XML and hence your error).

Make sure that you find out what causes the NULL in the first place.

Also, how are you interacting with the WebService? If you're using Axis, make sure that the WSDL has some encoding specified for data in and out.

Solution 3

This is an encoding issue. Either you read it the inputstream as UTF8 and it isn't or the other way around.

You should specify the encoding explicitly when you read the content. E.g. via

new InputStreamReader(getInputStream(), "UTF-8")

Another problem could be the tomcat. Try to add URIEncoding="UTF-8" in your tomcat’s connector settings in the server.xml file. Because:

It turned out that the JSP specification says that if the page encoding of the JSP pages is not explicitely declared, then ISO-8859-1 should be used (!).

Taken from here.

Share:
68,744
JuanDeLosMuertos
Author by

JuanDeLosMuertos

Software engineer

Updated on March 03, 2020

Comments

  • JuanDeLosMuertos
    JuanDeLosMuertos over 4 years

    Parsing an xml file on Java I get the error:

    An invalid XML character (Unicode: 0x0) was found in the element content of the document.

    The xml comes from a webservice.

    The problem is that I get the error only when the webservice is running on localhost (windows+tomcat), but not when the webservice is online (linux+tomcat).

    How can I replace the invalid char?? Thanks.

  • Tomalak
    Tomalak over 14 years
    +1 for common sense approach. Blindly fixing such an error without caring where it came from is not a good idea.
  • sp00m
    sp00m over 10 years
    +1, but can be simplified by dirtyXMLString.replaceAll("[\\000]*", "") though.
  • titogeo
    titogeo over 10 years
    Character like this fileformat.info/info/unicode/char/e4f8/index.htm fails while saving to mysql. Is there a generic way to find or ignore these in java. Adding "UTF-8" is not helping.
  • Xavi López
    Xavi López over 9 years
    -1 Those links seem to be dead now. This is why link-only answers are discouraged.
  • Whitecat
    Whitecat almost 9 years
    It can be sped up also by changing the* to a +: dirtyXMLString.replaceAll("[\\000]+", "")