Logistic regression - eval(family$initialize) : y values must be 0 <= y <= 1
Solution 1
You added the as.factor(dataCancer$Classification)
in the script, but even if the dataset dataCancer is attached, a command like the one above does not transform the dataset variable Classification into a factor. It only returns a factor on the console.
Since you want to fit the model on the training dataset, you either specify
training_data$Classification <- as.factor(training_data$Classification)
classification_model <- glm(Classification ~ ., data =
training_data, family = binomial)
or use the as.factor function in the glm line code
classification_model <- glm(as.factor(Classification) ~ ., data =
training_data, family = binomial)
Solution 2
classification_model = glm(Classification ~ ., data = training_data,family = binomial ) Error in eval(family$initialize) : y values must be 0 <= y <= 1
This is because your data contains numeric values, not factor values. I hope you did
dataCancer$Classification <- as.factor(dataCancer$Classification)
Ideally, 1,0 or 1,2 will not matter as long as it's a factor. But, if doing the above also doesn't help, then you can try converting 1,2 to 1,0 and then trying the same code.
Of course the second error is because logistic regression variable was not created at all.
Solution 3
You need to recode the Dependent variable as 0,1 so use the below code.
library(car)
dataCancer$Classification <- recode(dataCancer$Classification, "1=0; 2=1")
Ilan
Updated on June 15, 2022Comments
-
Ilan almost 2 years
I am trying to perform logistic regression using R in a dataset provided here : http://archive.ics.uci.edu/ml/machine-learning-databases/00451/ It is about breast cancer. This dataset contains a column Classification which contains only 1 (if patient doesn't have cancer) or 2 (if patient has cancer)
library(ISLR) dataCancer <- read.csv("~/Desktop/Isep/Machine Leaning/TD/Project_Cancer/dataR2.csv") attach(dataCancer) names(dataCancer) summary(dataCancer) cor(dataCancer[,-11]) pairs(dataCancer[,-11]) #Step : Split data into training and testing data training = (BMI>25) testing = !training training_data = dataCancer[training,] testing_data = dataCancer[testing,] Classification_testing = Classification[testing] #Step : Fit a logistic regression model using training data as.factor(dataCancer$Classification) classification_model = glm(Classification ~ ., data = training_data,family = binomial ) summary(classification_model)
When running my script I get :
> classification_model = glm(Classification ~ ., data = training_data,family = binomial ) Error in eval(family$initialize) : y values must be 0 <= y <= 1 > summary(classification_model) Error in summary(classification_model) : object 'classification_model' not found .
I added as.factor(dataCancer$Classification) as seen in other posts but it has not solved my problem. Can you suggest me a way to have a classification's value between 0 and 1 if it is the content of this predictor? Thanks for your help.
-
user20650 over 5 yearsWhile a lot of introductions use attach ; you really should never use it to attach a dataframe as iy can cause many problems like you found.
-
-
Ilan over 5 yearsI have tried your method it is also working. Thanks for your help.