TypeError: sequence item 0: expected a bytes-like object, str found

11,939

The problem is in this line of code: b' '.join(text)

The variable text is a list of strings but you are trying to join them with a byte-string (namely b' '), which doesn't work. You can solve this problem by simply removing b since you are converting the string to bytes afterwards anyway with the .encode('utf-8') method:

output.write(' '.join(text).encode('utf-8') + '\n')
Share:
11,939
louisten manalu
Author by

louisten manalu

Updated on June 09, 2022

Comments

  • louisten manalu
    louisten manalu almost 2 years

    I am trying to extract Indonesia titles from a wiki titles dump that's in a text file using word2vec-gensim in Python 3. The wiki dump contains titles in other languages also and some symbols. Below is my code:

        if len(sys.argv) != 3: 
        namaFileInput = "idwiki-latest-pages-articles.xml.bz2"
        namaFileOutput = "wiki.id.case.text"
        sys.exit(1)
    inp, outp = sys.argv[1:3]
    space = " "
    i = 0
    
    output = open(namaFileOutput, 'w')
    
    # lower=False: huruf kecil dan besar dibedakan
    wiki = WikiCorpus(namaFileInput, lemmatize=False, dictionary={}, lower=False)
    for text in wiki.get_texts():
        if six.PY3:
            output.write(b' '.join(text).encode('utf-8') + '\n')
        else:
            output.write(space.join(text) + "\n")
        i = i + 1
        if i % 10000 == 0:
            logger.info("Saved " + str(i) + " articles")
    
    output.close()
    logger.info("Finished Saved " + str(i) + " articles")
    

    But I am getting and error:

        TypeError                                 Traceback (most recent call last)
    <ipython-input-17-d4c686a9093a> in <module>
         29 for text in wiki.get_texts():
         30     if six.PY3:
    ---> 31         output.write(b' '.join(text).encode('utf-8') + '\n')
         32     else:
         33         output.write(space.join(text) + "\n")
    
    TypeError: sequence item 0: expected a bytes-like object, str found
    

    I have searched online but could not succeed. Any help will be appreciated.