TypeError: float() argument must be a string or a number, not 'function' – Python/Sklearn
As this answer explains, fillna
isn't designed to work with a callback. If you pass one, it will be taken as the literal fill value, meaning your NaN
s will be replaced with lambdas:
df
col1 col2 col3 col4
row1 65.0 24 47.0 NaN
row2 33.0 48 NaN 89.0
row3 NaN 34 67.0 NaN
row4 24.0 12 52.0 17.0
df4.fillna(lambda x: x.median())
col1 col2 \
row1 65 24
row2 33 48
row3 <function <lambda> at 0x10bc47730> 34
row4 24 12
col3 col4
row1 47 <function <lambda> at 0x10bc47730>
row2 <function <lambda> at 0x10bc47730> 89
row3 67 <function <lambda> at 0x10bc47730>
row4 52 17
If you are trying to fill by median, the solution would be to create a dataframe of medians based on the column, and pass that to fillna
.
df
col1 col2 col3 col4
row1 65.0 24 47.0 NaN
row2 33.0 48 NaN 89.0
row3 NaN 34 67.0 NaN
row4 24.0 12 52.0 17.0
df.fillna(df.median())
df
col1 col2 col3 col4
row1 65.0 24 47.0 53.0
row2 33.0 48 52.0 89.0
row3 33.0 34 67.0 53.0
row4 24.0 12 52.0 17.0
HMLDude
Updated on November 29, 2020Comments
-
HMLDude over 3 years
I have the following code snippet from a program called Flights.py
... #Load the Dataset df = dataset df.isnull().any() df = df.fillna(lambda x: x.median()) # Define X and Y X = df.iloc[:, 2:124].values y = df.iloc[:, 136].values X_tolist = X.tolist() # Splitting the dataset into the Training set and Test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
The second to last line is throwing the following error:
Traceback (most recent call last): File "<ipython-input-14-d4add2ccf5ab>", line 3, in <module> X_train = sc.fit_transform(X_train) File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/base.py", line 494, in fit_transform return self.fit(X, **fit_params).transform(X) File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 560, in fit return self.partial_fit(X, y) File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 583, in partial_fit estimator=self, dtype=FLOAT_DTYPES) File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) TypeError: float() argument must be a string or a number, not 'function'
My dataframe
df
is of size (22587, 138)I was taking a look at the following question for inspiration:
TypeError: float() argument must be a string or a number, not 'method' in Geocoder
I tried the following adjustment:
# Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train.as_matrix) X_test = sc.transform(X_test.as_matrix)
Which resulted in the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'as_matrix'
I'm currently at a loss for how to scan thru the dataframe and find/convert the offending entries.
-
ayhan over 6 yearsIf you pass a Series pandas can align them so you don't actually need transform or broadcasting.
df.fillna(df.median())
. -
cs95 over 6 years@ayhan I didn't know that! Thank you.
-
HMLDude over 6 yearsI used
df.fillna(df.median())
now I am getting the same error as earlier in the day, before I put in the lambdaValueError: Input contains NaN, infinity or a value too large for dtype('float64').
-
cs95 over 6 years@HMLDude it is possibly an issue with your data... you should look into using
df.clip
: pandas.pydata.org/pandas-docs/stable/generated/… -
HMLDude over 6 yearsSo I just eyeballed the data and there are still a ton of rows with NaN values in
df
after callingdf.fillna(df.median())
. -
cs95 over 6 years@HMLDude Try:
med = pd.DataFrame(df.transform('median').values[:, None].T * np.ones_like(df), columns=df.columns, index=df.index); df = df.fillna(med)
-
HMLDude over 6 yearsI saw that as your initial answer. It results in the following error:
ValueError: transforms cannot produce aggregated results
. -
cs95 over 6 years@HMLDude If you would be so kind as to provide a snippet of the data that is producing the error, I could help debug. At this rate, I have no idea what the problem is nor can I begin to think of a solution for it.
-
HMLDude over 6 yearsLet us continue this discussion in chat.