How can I replace'&' to '&' in python?

13,488

Solution 1

str.replace creates and returns a new string. It can't alter strings in-place - they're immutable. Try replacing:

file=xml_file.readlines()

with

file = [line.replace('&','&') for line in xml_file]

This uses a list comprehension to build a list equivalent to .readlines() but with the replacement already made.


As pointed out in the comments, if there were already &s in the string, they'd be turned into &, likely not what you want. To avoid that, you could use a negative lookahead in a regular expression to replace only the ampersands not already followed by amp;:

import re

file = [re.sub("&(?!amp;)", "&", line) ...]

Solution 2

str.replace() returns new string object with the change made. It does not change data in-place. You are ignoring the return value.

You want to apply it to each line instead:

file = [line.replace('&', '&') for line in file]

You could use the fileinput() module to do the transformation, and have it handle replacing the original file (a backup will be made):

import fileinput
import sys

for line in fileinput.input('filename', inplace=True):
    sys.stdout.write(line.replace('&', '&'))

Solution 3

Oh... You need to decode HTML notation for special symbols. Python has module to deal with it - HTMLParser, here some docs.

Here is example:

import HTMLParser

out_file = ....    
file = xml_file.readlines()
parsed_lines = []
for line in file:
     parsed_lines.append(htmlparser.unescape(line))

Solution 4

Slightly off topic, but it might be good to use some escaping?

I often use urllib's quote which will put the HTML escaping in and out:

 result=urllib.quote("filename&fileextension")
 'filename%26fileextension'
 urllib.unquote(result)
 filename&fileextension

Might help for consistency?

Share:
13,488
Matt
Author by

Matt

Updated on June 14, 2022

Comments

  • Matt
    Matt almost 2 years

    I'm having issues with .replace(). My XML parser does not like '&', but will accept '&\amp;'. I'd like to use .replace('&','&') but this does not seem to be working. I keep getting the error:

    lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 51, column 41
    

    So far I have tried just a straight forward file=file.replace('&','&'), but this doesn't work. I've also tried:

    xml_file = infile
    file=xml_file.readlines()
    for line in file:
            for char in line:
                    char.replace('&','&')
    infile=open('a','w')
    file='\n'.join(file)
    infile.write(file)
    infile.close()
    infile=open('a','r')
    xml_file=infile
    

    What would be the best way to fix my issue?