'ValueError: could not convert string to float' in python sklearn

10,098

You have to convert time date from string to pandas timestamp. This can be done with the following line (everything else is kept as you write):

model = (
    pd.read_csv("source.csv", parse_dates=['rssi_ts', 'batl_ts'], date_parser=lambda x: pd.to_datetime(x))
    .assign(
        rssi_ts=lambda x: x.loc[:, 'rssi_ts'].astype(int) / 10 ** 9,
        batl_ts=lambda x: x.loc[:, 'batl_ts'].astype(int) / 10 ** 9,
        ts_diff=lambda x: pd.to_timedelta(x.loc[:, 'ts_diff']).astype(int) / 10 ** 9
    )
)

Timestamp objects created by the parse_dates arguments can be converted into float.

Edit: a bracket was missing.

Edit2: for other timestamp and the delta time.

Share:
10,098

Related videos on Youtube

XCeptable
Author by

XCeptable

Updated on June 04, 2022

Comments

  • XCeptable
    XCeptable almost 2 years

    I have a Pandas DataFrame with date columns. The data is imported from a csv file. When I try to fit the regression model, I get the error ValueError: could not convert string to float: '2019-08-30 07:51:21. .

    How can I get rid of it?

    Here is dataframe.

    source.csv

        event_id    tsm_id  rssi_ts        rssi batl    batl_ts    ts_diff
    0   417736018   4317714 2019-09-05 20:00:07 140 100.0   2019-09-05 18:11:49 01:48:18
    1   417735986   4317714 2019-09-05 20:00:07 132 100.0   2019-09-05 18:11:49 01:48:18
    2   418039386   4317714 2019-09-06 01:00:08 142 100.0   2019-09-06 00:11:50 00:48:18
    3   418039385   4317714 2019-09-06 01:00:08 122 100.0   2019-09-06 00:11:50 00:48:18
    4   420388010   4317714 2019-09-07 15:31:07 143 100.0   2019-09-07 12:11:50 03:19:17
    

    Here is my code:

    model = pd.read_csv("source.csv")
    model.describe()
    
            event_id        tsm_id.         rssi        batl
    count   5.000000e+03    5.000000e+03    5000.000000 3784.000000
    mean    3.982413e+08    4.313492e+06    168.417200  94.364429
    std 2.200899e+07    2.143570e+03    35.319516   13.609917
    min 3.443084e+08    4.310312e+06    0.000000    16.000000
    25% 3.852882e+08    4.310315e+06    144.000000  97.000000
    50% 4.007999e+08    4.314806e+06    170.000000  100.000000
    75% 4.171803e+08    4.314815e+06    195.000000  100.000000
    max 4.258451e+08    4.317714e+06    242.000000  100.000000
    
    labels_b = np.array(model['batl'])
    features_r= model.drop('batl', axis = 1)
    features_r = np.array(features_r)
    
    from sklearn.model_selection import train_test_split
    train_features, test_features, train_labels, test_labels = train_test_split(features_r,          
    labels_b, test_size = 0.25, random_state = 42)
    
    from sklearn.ensemble import RandomForestRegressor
    rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
    rf.fit(train_features, train_labels);
    

    Here is error msg:

    ValueError                                Traceback (most recent call last)
    <ipython-input-28-bc774a9d8239> in <module>
          4 rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
          5 # Train the model on training data
    ----> 6 rf.fit(train_features, train_labels);
    
    ~/ml/env/lib/python3.7/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
        247 
        248         # Validate or convert input data
    --> 249         X = check_array(X, accept_sparse="csc", dtype=DTYPE)
        250         y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
        251         if sample_weight is not None:
    
    ~/ml/env/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
        494             try:
        495                 warnings.simplefilter('error', ComplexWarning)
    --> 496                 array = np.asarray(array, dtype=dtype, order=order)
        497             except ComplexWarning:
        498                 raise ValueError("Complex data not supported\n"
    
    ~/ml/env/lib/python3.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
        536 
        537     """
    --> 538     return array(a, dtype, copy=False, order=order)
        539 
        540 
    
    ValueError: could not convert string to float: '2019-08-30 07:51:21'
    
  • XCeptable
    XCeptable over 4 years
    its raising same error. though you apply here to one feature only, I have 3 features with timestamp.
  • XCeptable
    XCeptable over 4 years
    thank you for the answer. Though its giving error now ' TypeError: float() argument must be a string or a number, not 'Timestamp''
  • XCeptable
    XCeptable over 4 years
    these are types: event_id int64 tsm_tuid int64 rssi_ts float64 rssi int64 batl float64 batl_ts datetime64[ns] ts_diff object dtype: object