Parsing a text file with Python?

21,757

Solution 1

You say you're very new to Python, so I'll start at the very low-level. You can iterate over the lines in a file very simply in Python

fyle = open("contents.txt")
for lyne in fyle :
    # Do string processing here
fyle.close()

Now how to parse it. If each formatting directive (e.g. p, h1), is on a separate line, you can check that easily. I'd build up a dictionary of handlers and get the handler like so:

handlers= {"p": # p tag handler
           "h1": # h1 tag handler
          }

# ... in the loop
    if lyne.rstrip() in handlers :  # strip to remove trailing whitespace
        # close current handler?
        # start new handler?
    else :
        # pass string to current handler

You could do what Daniel Pryden suggested and create an in-memory data structure first, and then serialize that the XHTML. In that case, the handlers would know how to build the objects corresponding to each tag. But I think the simpler solution, especially if you don't have lots of time, you have is just to go straight to XHTML, keeping a stack of the current enclosed tags. In that case your "handler" may just be some simple logic to write the tags to the output file/string.

I can't say more without knowing the specifics of your problem. And besides, I don't want to do all your homework for you. This should give you a good start.

Solution 2

Rather than going directly from the text file you describe to an XHTML file, I would transform it into an intermediate in-memory representation first.

So I would build classes to represent the p and h1 tags, and then go through the text file and build those objects and put them into a list (or even a more complex object, but from the looks of your file a list should be sufficient). Then I would pass the list to another function that would loop through the p and h1 objects and output them as XHTML.

As an added bonus, I would make each tag object (say, Paragraph and Heading1 classes) implement an as_xhtml() method, and delegate the actual formatting to that method. Then the XHTML output loop could be something like:

for tag in input_tags:
    xhtml_file.write(tag.as_xhtml())
Share:
21,757
Bojan
Author by

Bojan

Updated on March 10, 2020

Comments

  • Bojan
    Bojan about 4 years

    I have to do an assignment where i have a .txt file that contains something like this
    p
    There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain...

    h1
    this is another example of what this text file looks like

    i am suppose to write a python code that parses this text file and creates and xhtml file
    I need to find a starting point for this project because i am very new to python and not familiar with alot of this stuff.
    This python code is suppose to take each of these "tags" from this text file and put them into an xhtml file I hope that what i ask makes sense to you.
    Any help is greatly appreciated,
    Thanks in advance!
    -bojan

    • mjv
      mjv over 14 years
      The format of the input file is somewhat ambiguous. It may be the way it is displayed in SO, but... Are these "p", "h1" marks expected to be on a separate line, just before the section to which they are applied, or are they html-like tags, for which the brackets and closing tag have been lost or omitted here?
  • Daniel Pryden
    Daniel Pryden over 14 years
    +1 nice answer. I would make a point of calling fyle.close(), though (or, even better, using with open("contents.txt") as fyle:). It's a good habit to get into -- you can often get away with letting the garbage collector take care of open files, but you really shouldn't.
  • Anuj Singh
    Anuj Singh over 14 years
    You're right, and I've added fyle.close() as you've suggested. with ... as is better, but this is simpler to understand for a beginner.