How can I replace'&' to '&' in python?
Solution 1
str.replace
creates and returns a new string. It can't alter strings in-place - they're immutable. Try replacing:
file=xml_file.readlines()
with
file = [line.replace('&','&') for line in xml_file]
This uses a list comprehension to build a list equivalent to .readlines()
but with the replacement already made.
As pointed out in the comments, if there were already &
s in the string, they'd be turned into &
, likely not what you want. To avoid that, you could use a negative lookahead in a regular expression to replace only the ampersands not already followed by amp;
:
import re
file = [re.sub("&(?!amp;)", "&", line) ...]
Solution 2
str.replace()
returns new string object with the change made. It does not change data in-place. You are ignoring the return value.
You want to apply it to each line instead:
file = [line.replace('&', '&') for line in file]
You could use the fileinput()
module to do the transformation, and have it handle replacing the original file (a backup will be made):
import fileinput
import sys
for line in fileinput.input('filename', inplace=True):
sys.stdout.write(line.replace('&', '&'))
Solution 3
Oh...
You need to decode HTML notation for special symbols. Python has module to deal with it - HTMLParser
, here some docs.
Here is example:
import HTMLParser
out_file = ....
file = xml_file.readlines()
parsed_lines = []
for line in file:
parsed_lines.append(htmlparser.unescape(line))
Solution 4
Slightly off topic, but it might be good to use some escaping?
I often use urllib's quote which will put the HTML escaping in and out:
result=urllib.quote("filename&fileextension")
'filename%26fileextension'
urllib.unquote(result)
filename&fileextension
Might help for consistency?
Matt
Updated on June 14, 2022Comments
-
Matt almost 2 years
I'm having issues with
.replace()
. My XML parser does not like '&', but will accept '&\amp;'. I'd like to use.replace('&','&')
but this does not seem to be working. I keep getting the error:lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 51, column 41
So far I have tried just a straight forward
file=file.replace('&','&')
, but this doesn't work. I've also tried:xml_file = infile file=xml_file.readlines() for line in file: for char in line: char.replace('&','&') infile=open('a','w') file='\n'.join(file) infile.write(file) infile.close() infile=open('a','r') xml_file=infile
What would be the best way to fix my issue?