ValueError: could not convert string to float:
Solution 1
Try to skip a header, an empty header in the first column is causing the issue.
>>> float(' ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float:
If you want to skip the header you can achieve it with:
def loadDatasetNB(filename):
lines = csv.reader(open(filename, "rt"))
next(reader, None) # <<- skip the headers
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
(2) Or you can just ignore the exception:
try:
float(element)
except ValueError:
pass
If you decide to go with option (2), make sure that you skip only first row or only rows that contain text and you know it for sure.
Solution 2
Looking at the image of your data, python cannot convert the last column of your data with the values square
and circle
. Also, you have a header in your data that you need to skip.
Try using this code:
def loadDatasetNB(filename):
with open(filename, 'r') as fp:
reader= csv.reader(fp)
# skip the header line
header = next(reader)
# save the features and the labels as different lists
data_features = []
data_labels = []
for row in reader:
# convert everything except the label to a float
data_features.append([float(x) for x in row[:-1]])
# save the labels separately
data_labels.append(row[-1])
return data_features, data_labels
Thom Elliott
New to coding, learning python for shape recognition using opencv and machine learning.
Updated on March 27, 2020Comments
-
Thom Elliott about 4 years
I am following a this tutorial to write a Naive Bayes Classifier: http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/
I keep getting this error:
dataset[i] = [float(x) for x in dataset[i]] ValueError: could not convert string to float:
Here is the part of my code where the error occurs:
def loadDatasetNB(filename): lines = csv.reader(open(filename, "rt")) dataset = list(lines) for i in range(len(dataset)): dataset[i] = [float(x) for x in dataset[i]] return dataset
And here is how the file is called:
def NB_Analysis(): filename = 'fvectors.csv' splitRatio = 0.67 dataset = loadDatasetNB(filename) trainingSet, testSet = splitDatasetNB(dataset, splitRatio) print('Split {0} rows into train={1} and test={2} rows').format(len(dataset), len(trainingSet), len(testSet)) # prepare model summaries = summarizeByClassNB(trainingSet) # test model predictions = getPredictionsNB(summaries, testSet) accuracy = getAccuracyNB(testSet, predictionsNB) print('Accuracy: {0}%').format(accuracy) NB_Analysis()
My file fvectors.csv looks like this
What is going wrong here and how do I fix it?