How to parse an XML file with encoding declaration in Python?

19,825

Solution 1

One thing that I tried, that worked for me is to open the xml file as a file object , then use ElementTree.fromstring() passing in the complete contents of the file.

Example -

>>> import xml.etree.ElementTree as ET
>>> ef = ET.parse('a.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 1187, in parse
    tree.parse(source, parser)
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
ValueError: multi-byte encodings are not supported
>>> with open('a.xml','r') as f:
...     ef = ET.fromstring(f.read())
...
>>> ef
<Element 'productMeta' at 0x028DF180>

You can also, create an XMLParser with the required encoding, and this should enable you to be able to parse strings from that encoding, Example -

import xml.etree.ElementTree as ET
xmlp = ET.XMLParser(encoding="utf-8")
f = ET.parse('a.xml',parser=xmlp)

Solution 2

 ET.parse('a.xml', parser=ET.XMLParser(encoding='iso-8859-5'))

solved my problem when dealed with xml excel in python

Share:
19,825
osjerick
Author by

osjerick

Updated on June 22, 2022

Comments

  • osjerick
    osjerick about 2 years

    I have this XML file, called xmltest.xml:

    <?xml version="1.0" encoding="GBK"?>
    <productMeta>
        <bands>1,2,3,4</bands>
        <imageName>TestName.tif</imageName>  
        <browseName>TestName.jpg</browseName>
    </productMeta>
    

    And I have this Python dummy code:

    import xml.etree.ElementTree as ET
    xmldoc = ET.parse('xmltest.xml')
    

    But it raises a ValueError:

    ValueError: multi-byte encodings are not supported

    I understand this error, it raises because the encoding declaration in the first line of the XML file. The XML file is UTF-8 encoded but always have that declaration (I'm not the creator of the XML files to be analyzed). How can I avoid such encoding declaration when parsing an XML file such the former one?