How do I get rid of characters like ' that appear instead of apostrophes?
26,587
The following BeautifulSoup documentation on entity conversion should be what you're looking for:
http://www.crummy.com/software/BeautifulSoup/documentation.html#Entity%20Conversion
Comments
-
nindalf almost 2 years
Possible Duplicate:
Convert XML/HTML Entities into Unicode String in PythonI am attempting to scrape a website using Python. I import and use the urllib2, BeautifulSoup and re modules.
response = urllib2.urlopen(url) soup = BeautifulSoup(response) responseString = str(soup) coarseExpression = re.compile('<div class="sodatext">[\n]*.*[\n]*</div>') coarseResult = coarseExpression.findall(responseString) fineExpression = re.compile('<[^>]*>') fineResult = [] for coarse in coarseResult: fine = fineExpression.sub('', coarse) #print(fine) fineResult.append(fine)
Unfortunately, characters like apostrophes appear in a corrupted manner like so - ' ; Is there a way to avoid this? Or a way to replace them easily?