How to avoid decoding to str: need a bytes-like object error in pandas?

python python-3.x pandas gensim topic-modeling

28,025

Your data has NaNs(not a number).

You can either drop them first:

documents = documents.dropna(subset=['content'])

Or, you can fill all NaNs with an empty string, convert the column to string type and then map your string based function.

documents['content'].fillna('').astype(str).map(preprocess)

This is because your function preprocess has function calls that accept string only data type.

Edit:

How do I know that your data contains NaNs? Numpy nan are considered float values

>>> import numpy as np
>>> type(np.nan)
<class 'float'>

Hence, you get the error

TypeError: decoding to str: need a bytes-like object, float found

28,025

Author by

wayne64001

Updated on January 07, 2022

Comments

wayne64001 over 2 years

Here is my code :

data = pd.read_csv('asscsv2.csv', encoding = "ISO-8859-1", error_bad_lines=False);
data_text = data[['content']]
data_text['index'] = data_text.index
documents = data_text

It looks like

print(documents[:2])
                                              content  index
 0  Pretty extensive background in Egyptology and ...      0
 1  Have you guys checked the back end of the Sphi...      1

And I define a preprocess function by using gensim

stemmer = PorterStemmer()
def lemmatize_stemming(text):
    return stemmer.stem(WordNetLemmatizer().lemmatize(text, pos='v'))
def preprocess(text):
    result = []
    for token in gensim.utils.simple_preprocess(text):
        if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3:
            result.append(lemmatize_stemming(token))
    return result

And when I use this function:

processed_docs = documents['content'].map(preprocess)

It appears

TypeError: decoding to str: need a bytes-like object, float found

How to encode my csv file to byte-like object or how to avoid this kind of error?

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Python pandas has no attribute ols - Error (rolling OLS)

text to columns with comma delimiter using python

Engines in Python Pandas read_csv

What exceptions could be returned from Pandas read_sql()

Pandas and JSON ValueError: arrays must all be same length

How to read merged Excel cells with NaN into Pandas DataFrame

Rename column values using pandas DataFrame

Weird Error When Dividing two numbers in Pandas DataFrame

How can I see the formulas of an excel spreadsheet in pandas / python?

Insert list in pandas dataframe cell

How to avoid decoding to str: need a bytes-like object error in pandas?

wayne64001

Comments

Recents

Related