How to fix "UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128)" error in Python

11,904

bytes in Python 2 is a synonym for str, so by calling bytes() on your values you're encoding them as ASCII, which can't handle characters like '\xa0'. Encode the values directly:

file.write(header.encode("utf-8", errors="ignore"))
file.write(timetabledatasaved.encode("utf-8", errors="ignore"))
Share:
11,904

Related videos on Youtube

tobzville
Author by

tobzville

Updated on June 04, 2022

Comments

  • tobzville
    tobzville almost 2 years

    I am writing a code to crawl a student timetable from the school website using Beautifulsoup. The issue is I keep having this UnicodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128) result and I cannot resolve it.

    import urllib2
    from bs4 import BeautifulSoup
    import os
    
    def make_soup(url):
        thepage = urllib2.urlopen(url)
        soupdata = BeautifulSoup(thepage, "html.parser")
        return soupdata
    
    timetabledatasaved = ""
    soup = make_soup("http://timetable.ait.ie/reporting/textspreadsheet;student+set;id;AL%5FKSWFT%5FR%5F5%0D%0A?t"
                 "=student+set+textspreadsheet&days=1-5&weeks=21-32&periods="
                 "3-20&template=student+set+textspreadsheet")
    
    for record in soup.find_all('tr'):
        timetabledata = ""
        print record
        print '--------------------'
        for data in record('td'):
            timetabledata = timetabledata + "," + data.text
        if len(timetabledata) != 0:
            timetabledatasaved = timetabledatasaved + "\n" + timetabledata[1:]
    
    #print timetabledatasaved
    
    header = "Activity, Module, Type, Start, End, Duration, Weeks, Room, Staff, Student Groups"
    file = open(os.path.expanduser("timetable.csv"), "wb")
    file.write(bytes(header).encode("utf-8", errors="ignore"))
    file.write(bytes(timetabledatasaved).encode("utf-8", errors="ignore"))
    

    I used Utf-8 but it still gives me this error after crawling the timetable. Again, I realized that my code seems to crawl even the javascripts in the page but I only want it to print out the relevant timetable data and save it as a .csv file.

    Traceback (most recent call last):
      File "/Users/tobenna/PycharmProjects/final_project/venv/timetable_scrape.py", line 30, in <module>
        file.write(bytes(timetabledatasaved).encode("utf-8", errors="ignore")) 
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128)
    
    Process finished with exit code 1