TypeError: float() argument must be a string or a number, not 'function' – Python/Sklearn

26,724

As this answer explains, fillna isn't designed to work with a callback. If you pass one, it will be taken as the literal fill value, meaning your NaNs will be replaced with lambdas:

df

      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0

df4.fillna(lambda x: x.median())

                                    col1  col2  \
row1                                  65    24   
row2                                  33    48   
row3  <function <lambda> at 0x10bc47730>    34   
row4                                  24    12   

                                    col3                                col4  
row1                                  47  <function <lambda> at 0x10bc47730>  
row2  <function <lambda> at 0x10bc47730>                                  89  
row3                                  67  <function <lambda> at 0x10bc47730>  
row4                                  52                                  17 

If you are trying to fill by median, the solution would be to create a dataframe of medians based on the column, and pass that to fillna.

df
      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0

df.fillna(df.median())
df 
      col1  col2  col3  col4
row1  65.0    24  47.0  53.0
row2  33.0    48  52.0  89.0
row3  33.0    34  67.0  53.0
row4  24.0    12  52.0  17.0
Share:
26,724
HMLDude
Author by

HMLDude

Updated on November 29, 2020

Comments

  • HMLDude
    HMLDude over 3 years

    I have the following code snippet from a program called Flights.py

    ...
    #Load the Dataset
    df = dataset
    df.isnull().any()
    df = df.fillna(lambda x: x.median())
    
    # Define X and Y
    X = df.iloc[:, 2:124].values
    y = df.iloc[:, 136].values
    X_tolist = X.tolist()
    
    # Splitting the dataset into the Training set and Test set
    from sklearn.cross_validation import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
    
    # Feature Scaling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    

    The second to last line is throwing the following error:

    Traceback (most recent call last):
    
      File "<ipython-input-14-d4add2ccf5ab>", line 3, in <module>
        X_train = sc.fit_transform(X_train)
    
      File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/base.py", line 494, in fit_transform
        return self.fit(X, **fit_params).transform(X)
    
      File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 560, in fit
        return self.partial_fit(X, y)
    
      File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 583, in partial_fit
        estimator=self, dtype=FLOAT_DTYPES)
    
      File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
        array = np.array(array, dtype=dtype, order=order, copy=copy)
    
    TypeError: float() argument must be a string or a number, not 'function'
    

    My dataframe df is of size (22587, 138)

    I was taking a look at the following question for inspiration:

    TypeError: float() argument must be a string or a number, not 'method' in Geocoder

    I tried the following adjustment:

    # Feature Scaling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train.as_matrix)
    X_test = sc.transform(X_test.as_matrix)
    

    Which resulted in the following error:

    AttributeError: 'numpy.ndarray' object has no attribute 'as_matrix'
    

    I'm currently at a loss for how to scan thru the dataframe and find/convert the offending entries.

  • ayhan
    ayhan over 6 years
    If you pass a Series pandas can align them so you don't actually need transform or broadcasting. df.fillna(df.median()).
  • cs95
    cs95 over 6 years
    @ayhan I didn't know that! Thank you.
  • HMLDude
    HMLDude over 6 years
    I used df.fillna(df.median()) now I am getting the same error as earlier in the day, before I put in the lambda ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
  • cs95
    cs95 over 6 years
    @HMLDude it is possibly an issue with your data... you should look into using df.clip: pandas.pydata.org/pandas-docs/stable/generated/…
  • HMLDude
    HMLDude over 6 years
    So I just eyeballed the data and there are still a ton of rows with NaN values in df after calling df.fillna(df.median()).
  • cs95
    cs95 over 6 years
    @HMLDude Try: med = pd.DataFrame(df.transform('median').values[:, None].T * np.ones_like(df), columns=df.columns, index=df.index); df = df.fillna(med)
  • HMLDude
    HMLDude over 6 years
    I saw that as your initial answer. It results in the following error: ValueError: transforms cannot produce aggregated results.
  • cs95
    cs95 over 6 years
    @HMLDude If you would be so kind as to provide a snippet of the data that is producing the error, I could help debug. At this rate, I have no idea what the problem is nor can I begin to think of a solution for it.
  • HMLDude
    HMLDude over 6 years