Why my DOM parser cant read UTF-8

22,684

Solution 1

Try this. Worked for me

        InputStream inputStream= new FileInputStream(completeFileName);
        Reader reader = new InputStreamReader(inputStream,"UTF-8");
        InputSource is = new InputSource(reader);
        is.setEncoding("UTF-8");

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(is);

Solution 2

Try to use Reader and provide encoding as parameter:

InputStream inputStream = new FileInputStream(fileName);
documentBuilder.parse(new InputSource(new InputStreamReader(inputStream, "UTF-8")));
Share:
22,684
ivanz
Author by

ivanz

Updated on April 23, 2020

Comments

  • ivanz
    ivanz about 4 years

    I have problem that my DOM parser can´t load file when there are UTF-8 characters in XML file Now, i am aware that i have to give him instruction to read utf-8, but i don´t know how to put it in my code here it is:

    File xmlFile = new File(fileName);
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(xmlFile);
    doc.getDocumentElement().normalize();
    

    i am aware that there is method setencoding(), but i don´t know where to put it in my code...

  • ivanz
    ivanz about 11 years
    inputsource is in DOM API?
  • ivanz
    ivanz about 11 years
    not working or i dont know how to use it. non static method parse cannot be referenced from a static context sax
  • Holger
    Holger almost 9 years
    The second argument of the method DocumentBuilder.parse(InputStream, String) is a URI, not a character encoding. It’s rather strange when providing UTF-8 there solves any problems…
  • Rajesh Mbm
    Rajesh Mbm over 7 years
    Glad to hear you found it helpful... :)