Logistic regression - eval(family$initialize) : y values must be 0 <= y <= 1

r logistic-regression

12,554

Solution 1

You added the as.factor(dataCancer$Classification) in the script, but even if the dataset dataCancer is attached, a command like the one above does not transform the dataset variable Classification into a factor. It only returns a factor on the console.

Since you want to fit the model on the training dataset, you either specify

training_data$Classification <- as.factor(training_data$Classification)
classification_model <- glm(Classification ~ ., data = 
                           training_data, family = binomial)

or use the as.factor function in the glm line code

classification_model <- glm(as.factor(Classification) ~ ., data = 
                           training_data, family = binomial)

Solution 2

classification_model = glm(Classification ~ ., data = training_data,family = binomial ) Error in eval(family$initialize) : y values must be 0 <= y <= 1

This is because your data contains numeric values, not factor values. I hope you did

dataCancer$Classification <- as.factor(dataCancer$Classification)

Ideally, 1,0 or 1,2 will not matter as long as it's a factor. But, if doing the above also doesn't help, then you can try converting 1,2 to 1,0 and then trying the same code.

Of course the second error is because logistic regression variable was not created at all.

Solution 3

You need to recode the Dependent variable as 0,1 so use the below code.

library(car)

dataCancer$Classification <- recode(dataCancer$Classification, "1=0; 2=1")

12,554

Author by

Ilan

Updated on June 15, 2022

Comments

Ilan almost 2 years

I am trying to perform logistic regression using R in a dataset provided here : http://archive.ics.uci.edu/ml/machine-learning-databases/00451/ It is about breast cancer. This dataset contains a column Classification which contains only 1 (if patient doesn't have cancer) or 2 (if patient has cancer)

library(ISLR)
dataCancer <- read.csv("~/Desktop/Isep/Machine 
Leaning/TD/Project_Cancer/dataR2.csv")
attach(dataCancer)
names(dataCancer)
summary(dataCancer)

cor(dataCancer[,-11])
pairs(dataCancer[,-11])

#Step : Split data into training and testing data
training = (BMI>25)
testing = !training
training_data = dataCancer[training,]
testing_data = dataCancer[testing,]

Classification_testing = Classification[testing]

#Step : Fit a logistic regression model using training data
as.factor(dataCancer$Classification)
classification_model = glm(Classification ~ ., data = 
training_data,family = binomial )
summary(classification_model)

When running my script I get :

> classification_model = glm(Classification ~ ., data = training_data,family = binomial )
Error in eval(family$initialize) : y values must be 0 <= y <= 1
> summary(classification_model)
Error in summary(classification_model) : object 'classification_model' not found .

I added as.factor(dataCancer$Classification) as seen in other posts but it has not solved my problem. Can you suggest me a way to have a classification's value between 0 and 1 if it is the content of this predictor? Thanks for your help.

user20650 over 5 years

While a lot of introductions use attach ; you really should never use it to attach a dataframe as iy can cause many problems like you found.