How to convert Xml files to Text Files

43,444

Solution 1

Going from XML to text smells like a job for XSLT - it's a XML-based transformation language that can take an XML input and convert it to anything text-based on the output side.

You can read up on XSLT on lots of websites - one of the better tutorials in the W3Schools one.

Since you didn't post any sample XML, I have no clue what your XML looks like, and also no idea what your output should be. But assuming it would look something like:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <title>Some Title</title>
  <description>Some description</description>
  <keywords>
    <keyword>Keyword1</keyword>
    <keyword>Keyword2</keyword>
    <keyword>Keyword3</keyword>
    <keyword>Keyword4</keyword>
   </keywords> 
</root>

you could easily write a XSLT transformation to turn that into

YourTextFile.txt

Some Title
Some Description
Keyword1,Keyword2,Keyword3,Keyword4

or whatever other format you are looking for.

Solution 2

My suggestion would be to use Python. You can use the interpreter to run the pattern while you are setting it up, command line goes along way in setting this sort of thing up properly. Assuming the xml is valid this should allow you the most flexibility with the least hassle.

so assuming the following xml format:

<root>
  <title>Document Title</title>
  <content>Some document content.</content>
  <keywords>test, document, keyword</keywords>
</root>

and assuming the output of each document should be:

Document Title

Some document content.

test, document, keyword

The python code might look something like:

import sys
import os
from xml.etree.ElementTree import ElementTree

def Readthexml(f):
    """Read the file from the argument list and dump the title contents and keywords"""
    xcontent = ElementTree()
    xcontent.parse(f)
    doc = [xcontent.find("title").text, xcontent.find("content").text, xcontent.find("keywords").text]
    out = open(f + ".txt", "w")
    out.write("\n\n".join(doc))
    return True

def main(argv=None):
    if argv is None:
        argv = sys.argv
        args = argv[1:]
        for arg in args:
            if os.path.exists(arg):
                Readthexml(arg)

if __name__ == "__main__":
    main()

from which you could generate a batch file to update files regularly (assuming it is a windows environment though python works in whatever).

Solution 3

There are a couple of possibilities. If it is simple XML you can read it like any other text file, filter out the angle brackets and add in your own strategically-placed punctuation. Or, you can open up an XML reader and a text writer, and output it any way you want.

If you read the file names from the folder into a collection, you can loop through them and process all of the files automatically.

Share:
43,444
Jason
Author by

Jason

Updated on June 26, 2020

Comments

  • Jason
    Jason about 4 years

    I have around 8000 xml files that needs to be converted into text files. The text file must contain title, description and keywords of the xml file without the tags and removing other elements and attributes as well. In other words, i need to create 8000 text files containing the title,description and keywords of the xml file. I need codings for this to be done systematically. Any help would be greatly appreciated. Thanks in advance.

  • Jason
    Jason about 14 years
    Hey Robert, Thanks for your reply. Do you mind if you can help me get some websites for reference because i'm really new to this topic and my supervisor is rushing me for it to be done. Thanks very much for your reply.
  • Gabriel
    Gabriel about 14 years
    I had an itch to answer that xsl would be a good solution, except that every time I have had to implement xsl in the past it took plenty of work to make programming language x use xsl the way god intended. Always got it working mind you and with stellar results so +1.
  • Jason
    Jason about 14 years
    Please take a look at my sample over at my new post thanks stackoverflow.com/questions/2941510/…
  • Nathan Tuggy
    Nathan Tuggy over 7 years
    Doing this manually for 8000 files individually doesn't sound very fun.