Logistic regression - eval(family$initialize) : y values must be 0 <= y <= 1

12,554

Solution 1

You added the as.factor(dataCancer$Classification) in the script, but even if the dataset dataCancer is attached, a command like the one above does not transform the dataset variable Classification into a factor. It only returns a factor on the console.

Since you want to fit the model on the training dataset, you either specify

training_data$Classification <- as.factor(training_data$Classification)
classification_model <- glm(Classification ~ ., data = 
                           training_data, family = binomial) 

or use the as.factor function in the glm line code

classification_model <- glm(as.factor(Classification) ~ ., data = 
                           training_data, family = binomial)

Solution 2

classification_model = glm(Classification ~ ., data = training_data,family = binomial ) Error in eval(family$initialize) : y values must be 0 <= y <= 1

This is because your data contains numeric values, not factor values. I hope you did

dataCancer$Classification <- as.factor(dataCancer$Classification)

Ideally, 1,0 or 1,2 will not matter as long as it's a factor. But, if doing the above also doesn't help, then you can try converting 1,2 to 1,0 and then trying the same code.

Of course the second error is because logistic regression variable was not created at all.

Solution 3

You need to recode the Dependent variable as 0,1 so use the below code.

library(car)

dataCancer$Classification <- recode(dataCancer$Classification, "1=0; 2=1")
Share:
12,554
Ilan
Author by

Ilan

Updated on June 15, 2022

Comments

  • Ilan
    Ilan almost 2 years

    I am trying to perform logistic regression using R in a dataset provided here : http://archive.ics.uci.edu/ml/machine-learning-databases/00451/ It is about breast cancer. This dataset contains a column Classification which contains only 1 (if patient doesn't have cancer) or 2 (if patient has cancer)

    library(ISLR)
    dataCancer <- read.csv("~/Desktop/Isep/Machine 
    Leaning/TD/Project_Cancer/dataR2.csv")
    attach(dataCancer)
    names(dataCancer)
    summary(dataCancer)
    
    cor(dataCancer[,-11])
    pairs(dataCancer[,-11])
    
    #Step : Split data into training and testing data
    training = (BMI>25)
    testing = !training
    training_data = dataCancer[training,]
    testing_data = dataCancer[testing,]
    
    Classification_testing = Classification[testing]
    
    #Step : Fit a logistic regression model using training data
    as.factor(dataCancer$Classification)
    classification_model = glm(Classification ~ ., data = 
    training_data,family = binomial )
    summary(classification_model)
    

    When running my script I get :

    > classification_model = glm(Classification ~ ., data = training_data,family = binomial )
    Error in eval(family$initialize) : y values must be 0 <= y <= 1
    > summary(classification_model)
    Error in summary(classification_model) : object 'classification_model' not found . 
    

    I added as.factor(dataCancer$Classification) as seen in other posts but it has not solved my problem. Can you suggest me a way to have a classification's value between 0 and 1 if it is the content of this predictor? Thanks for your help.

    • user20650
      user20650 over 5 years
      While a lot of introductions use attach ; you really should never use it to attach a dataframe as iy can cause many problems like you found.
  • Ilan
    Ilan over 5 years
    I have tried your method it is also working. Thanks for your help.