python - xml.etree.ElementTree.ParseError: not well-formed (invalid token)

13,775

You need to use parse() instead of fromstring() when parsing from a file.

parse() returns an ElementTree instance and tostring() expects an Element instance.

This code works:

import xml.etree.ElementTree as ETree

parser = ETree.XMLParser(encoding="utf-8")
tree = ETree.parse("test_xml.xml", parser=parser)
print(ETree.tostring(tree.getroot()))
Share:
13,775
vlad.rad
Author by

vlad.rad

Areas: Data Engineering, ETL Actively using: pandas, numpy (Python), Tableau, AWS Interests: Algorithms, architectures in automotive, chatbots

Updated on June 05, 2022

Comments

  • vlad.rad
    vlad.rad almost 2 years

    I have the following code:

    import xml.etree.ElementTree as ETree
    
    parser = ETree.XMLParser(encoding="utf-8")
    tree = ETree.fromstring("C:/Users/XXX/Downloads/test_xml.xml", parser=parser)
    print(ETree.tostring(tree))
    

    I get the following error message:

    Traceback (most recent call last):
      File "C:/Users/XXX/.PyCharmCE2018.1/config/scratches/scratch.py", line 6, in <module>
        tree = ETree.fromstring("C:/Users/XXX/Downloads/test_xml.xml", parser=parser)
      File "C:\Users\XXX\AppData\Local\Programs\Python\Python36-32\lib\xml\etree\ElementTree.py", line 1314, in XML
        parser.feed(text)
    xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 2
    

    I checked probably all questions to this error message on StackOverflow, nothing helped:

    • I tried to edit file with another editor (as adviced here);
    • I added this line: tree.set('SignalStrength',"100") (from here);
    • Tried to add DOCTYPE;
    • Checked the file with W3 Validator;

    etc.

    Then I tried to import another XML file with completely another structure - and error message remained the same - even the position: line 1, column 2.

    And then I tried to change the file's name to the non-existent - and the error message remained the same! So it is not a problem of file, it is something else. And I can't understand what.

    P.S. This is one of the XML files I used:

    <note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
    </note>
    

    EDIT: Probably I can't import file in the way I did in the fromstring() function?