TypeError: decoding Unicode is not supported

17,504

x is already unicode, as the cols[0].string field contains unicode (just as documented).

Share:
17,504
user805981
Author by

user805981

Updated on June 17, 2022

Comments

  • user805981
    user805981 almost 2 years

    New to python....Trying to get the parser to decode properly into a sqlite database but it just won't work :(

    # coding: utf8
    from pysqlite2 import dbapi2 as sqlite3
    import urllib2
    from bs4 import BeautifulSoup
    from string import *
    
    
    conn = sqlite3.connect(':memory:')
    cursor = conn.cursor()
    
    # # create a table
    def createTable():
        cursor.execute("""CREATE TABLE characters
                          (rank INTEGER PRIMARY KEY, word TEXT, definition TEXT) 
                       """)
    
    
    def insertChar(rank,word,definition):
        cursor.execute("""INSERT INTO characters (rank,word,definition)
                            VALUES (?,?,?)""",(rank,word,definition))
    
    
    def main():
        createTable()
    
        # u = unicode("辣", "utf-8")
    
        # insertChar(1,u,"123123123")
    
        soup = BeautifulSoup(urllib2.urlopen('http://www.zein.se/patrick/3000char.html').read())
        # print (html_doc.prettify())   
    
        tables = soup.blockquote.table
    
        # print tables
    
        rows = tables.find_all('tr')
        result=[]
        for tr in rows:
            cols = tr.find_all('td')
            character = []
            x = cols[0].string 
            y = cols[1].string 
            z = cols[2].string 
            xx = unicode(x, "utf-8")
            yy = unicode(y , "utf-8")
            zz = unicode(z , "utf-8")
            insertChar(xx,yy,zz)
    
        conn.commit() 
    
    main()
    

    I keep getting the follow error: TypeError: decoding Unicode is not supported

    WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
    Traceback (most recent call last):
      File "sqlitetestbed.py", line 64, in <module>
        main()
      File "sqlitetestbed.py", line 48, in main
        xx = unicode(x, "utf-8")
    
    
    Traceback (most recent call last):
    File "sqlitetestbed.py", line 52, in <module>
    main()
    File "sqlitetestbed.py", line 48, in main
    insertChar(x,y,z)
    File "sqlitetestbed.py", line 20, in insertChar
    VALUES (?,?,?)""",(rank,word,definition))
    pysqlite2.dbapi2.IntegrityError: datatype mismatch
    

    I'm probably doing something thats really stupid... :( Please tell me what I'm doing wrong... Thanks!

  • user805981
    user805981 about 11 years
    okay.... so I used the x instead of xx now... I'm getting a Traceback (most recent call last): File "sqlitetestbed.py", line 52, in <module> main() File "sqlitetestbed.py", line 48, in main insertChar(x,y,z) File "sqlitetestbed.py", line 20, in insertChar VALUES (?,?,?)""",(rank,word,definition)) Why is that?
  • wRAR
    wRAR about 11 years
    @user805981 is that the full message?
  • wRAR
    wRAR about 11 years
    @user805981 I don't see the actual exception text.
  • user805981
    user805981 about 11 years
    sorry about that its here Traceback (most recent call last): File "sqlitetestbed.py", line 52, in <module> main() File "sqlitetestbed.py", line 48, in main insertChar(x,y,z) File "sqlitetestbed.py", line 20, in insertChar VALUES (?,?,?)""",(rank,word,definition)) pysqlite2.dbapi2.IntegrityError: datatype mismatch
  • wRAR
    wRAR about 11 years
    The rank column is INTEGER PRIMARY KEY and you are inserting a string.
  • user805981
    user805981 about 11 years
    it is supposed to be a number... so instead of using x = cols[0].string i used x = cols[0].integer ... it passes though the parser and seems like it should have inserted into the db, but when I opened up the db file, it is empty. :/
  • wRAR
    wRAR about 11 years
    @user805981 Where did you get that .integer field? I don't see it in the docs.
  • user805981
    user805981 about 11 years
    I think I'm supposed to do an int(cols[0].string) ... got cha.. .. So when I use bs4, the first tr line or row is all words, and I won't be able to change that into any numbers, how do I skip that?
  • wRAR
    wRAR about 11 years
    @user805981 how is it related to this question?