'ValueError: could not convert string to float' in python sklearn
10,098
You have to convert time date from string to pandas timestamp. This can be done with the following line (everything else is kept as you write):
model = (
pd.read_csv("source.csv", parse_dates=['rssi_ts', 'batl_ts'], date_parser=lambda x: pd.to_datetime(x))
.assign(
rssi_ts=lambda x: x.loc[:, 'rssi_ts'].astype(int) / 10 ** 9,
batl_ts=lambda x: x.loc[:, 'batl_ts'].astype(int) / 10 ** 9,
ts_diff=lambda x: pd.to_timedelta(x.loc[:, 'ts_diff']).astype(int) / 10 ** 9
)
)
Timestamp
objects created by the parse_dates
arguments can be converted into float.
Edit: a bracket was missing.
Edit2: for other timestamp and the delta time.
Related videos on Youtube
Author by
XCeptable
Updated on June 04, 2022Comments
-
XCeptable almost 2 years
I have a Pandas DataFrame with date columns. The data is imported from a csv file. When I try to fit the regression model, I get the error
ValueError: could not convert string to float: '2019-08-30 07:51:21
. .How can I get rid of it?
Here is dataframe.
source.csv
event_id tsm_id rssi_ts rssi batl batl_ts ts_diff 0 417736018 4317714 2019-09-05 20:00:07 140 100.0 2019-09-05 18:11:49 01:48:18 1 417735986 4317714 2019-09-05 20:00:07 132 100.0 2019-09-05 18:11:49 01:48:18 2 418039386 4317714 2019-09-06 01:00:08 142 100.0 2019-09-06 00:11:50 00:48:18 3 418039385 4317714 2019-09-06 01:00:08 122 100.0 2019-09-06 00:11:50 00:48:18 4 420388010 4317714 2019-09-07 15:31:07 143 100.0 2019-09-07 12:11:50 03:19:17
Here is my code:
model = pd.read_csv("source.csv") model.describe() event_id tsm_id. rssi batl count 5.000000e+03 5.000000e+03 5000.000000 3784.000000 mean 3.982413e+08 4.313492e+06 168.417200 94.364429 std 2.200899e+07 2.143570e+03 35.319516 13.609917 min 3.443084e+08 4.310312e+06 0.000000 16.000000 25% 3.852882e+08 4.310315e+06 144.000000 97.000000 50% 4.007999e+08 4.314806e+06 170.000000 100.000000 75% 4.171803e+08 4.314815e+06 195.000000 100.000000 max 4.258451e+08 4.317714e+06 242.000000 100.000000 labels_b = np.array(model['batl']) features_r= model.drop('batl', axis = 1) features_r = np.array(features_r) from sklearn.model_selection import train_test_split train_features, test_features, train_labels, test_labels = train_test_split(features_r, labels_b, test_size = 0.25, random_state = 42) from sklearn.ensemble import RandomForestRegressor rf = RandomForestRegressor(n_estimators = 1000, random_state = 42) rf.fit(train_features, train_labels);
Here is error msg:
ValueError Traceback (most recent call last) <ipython-input-28-bc774a9d8239> in <module> 4 rf = RandomForestRegressor(n_estimators = 1000, random_state = 42) 5 # Train the model on training data ----> 6 rf.fit(train_features, train_labels); ~/ml/env/lib/python3.7/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight) 247 248 # Validate or convert input data --> 249 X = check_array(X, accept_sparse="csc", dtype=DTYPE) 250 y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None) 251 if sample_weight is not None: ~/ml/env/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 494 try: 495 warnings.simplefilter('error', ComplexWarning) --> 496 array = np.asarray(array, dtype=dtype, order=order) 497 except ComplexWarning: 498 raise ValueError("Complex data not supported\n" ~/ml/env/lib/python3.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order) 536 537 """ --> 538 return array(a, dtype, copy=False, order=order) 539 540 ValueError: could not convert string to float: '2019-08-30 07:51:21'
-
XCeptable over 4 yearsits raising same error. though you apply here to one feature only, I have 3 features with timestamp.
-
XCeptable over 4 yearsthank you for the answer. Though its giving error now ' TypeError: float() argument must be a string or a number, not 'Timestamp''
-
XCeptable over 4 yearsthese are types: event_id int64 tsm_tuid int64 rssi_ts float64 rssi int64 batl float64 batl_ts datetime64[ns] ts_diff object dtype: object