Prediction and Confidence intervals for Logistic Regression

r statistics intervals prediction glm

10,137

I am not sure if you are asking for the straight up prediction interval, but if you are you can calculate it simply.

You can extract a traditional confidence interval for the model as such:

confint(model)

And then once you run a prediction, you can calculate a prediction interval based on the prediction like so:

upper = predAll$fit + 1.96 * predAll$se.fit
lower = predAll$fit - 1.96 * predAll$se.fit

You are simply taking the prediction (at any given point if you use a single set of predictor variables) and adding and subtracting 1.96 * absolute value of the standard error. (1.96 se includes 97.5% of the normal distribution and represents the 95% interval as it does for the standard deviation in the normal distribution)

This is the same formula that you would use for a traditional confidence interval except that using the standard error (as opposed to the standard deviation) makes the interval wider to account for the uncertainty in prediction itself.

Update:

Method for plotting prediction invervals courtesy of Rstudio!

As requested...though not done by me!

10,137

Author by

Bob Hopez

Updated on June 04, 2022

Comments

Bob Hopez almost 2 years
Below is a set of fictitious probability data, which I converted into binomial with a threshold of 0.5. I ran a glm() model on the discrete data to test if the intervals returned from glm() were 'mean prediction intervals' ("Confidence Interval") or 'point prediction intervals'("Prediction Interval"). It appears from the plot below that the returned intervals are the latter--'Point Prediction Intervals'; note, with 95% confidence, 2/20 points fall outside of the line in this sample.

If this is indeed the case, how do I generate the the 'mean prediction interval' (i.e, "Confidence Intervals") in R for a binomial data set bound by 0 and 1 using glm()? Please show your code and plot similar to mine with the fit line, given probabilities, 'confidence intervals' and 'prediction intervals'.
```
# Fictitious data
xVal <- c(15,15,17,18,32,33,41,42,47,50,
         53,55,62,63,64,65,66,68,70,79,
         94,94,94,95,98)
randRatio <- c(.01,.03,.05,.04,.01,.2,.1,.08,.88,.2,
               .2,.99,.49,.88,.2,.88,.66,.87,.66,.90,
               .98,.88,.95,.95,.95)
# Converted to binomial
randBinom <- ifelse(randRatio < .5, 0, 1)

# Data frame for model
binomData <- data.frame(
  randBinom = randBinom,
  xVal = xVal
)

# Model
mode1 <- glm(randBinom~ xVal, data = binomData, family = binomial(link = "logit"))

# Predict all points in xVal range
frame <- data.frame(xVal=(0:100))
predAll <- predict(mode1, newdata = frame,type = "link", se.fit=TRUE)

# Params for intervals and plot
confidence <- .95
score <- qnorm((confidence / 2) + .5)
frame <- data.frame(xVal=(0:100))

#Plot
with(binomData, plot(xVal, randBinom, type="n", ylim=c(0, 1), 
                 ylab = "Probability", xlab="xVal"))
lines(frame$xVal, plogis(predAll$fit), col = "red", lty = 1)
lines(frame$xVal, plogis(predAll$fit + score * predAll$se.fit), col = "red", lty = 3)
lines(frame$xVal, plogis(predAll$fit - score * predAll$se.fit), col = "red", lty = 3)
points(xVal, randRatio, col = "red") # Original probabilities
points(xVal, randBinom, col = "black", lwd = 3) # Binomial Points used in glm
```
Here's the plot, presumably with 'point prediction intervals' (i.e., "Prediction Intervals") in dashed red, and the mean fit in solid red. Black dots represent the discrete binomial data from original probabilities in randRatio:
- IRTFM over 7 years
  
  I think your premise is incorrect. I think you are not seeing what you are calling "point prediction intervals" and what most people simply call "prediction intervals". What you are calling 'mean prediction intervals' is (probably) what most people would call "confidence intervals" and those apply to plausible locations of the estimated parameter.
- Bob Hopez over 7 years
  
  @42- I edited some of the wording to align better with your comment.
- Bob Hopez over 7 years
  
  @ZheyuanLi Please see modified question. I'm interested to see your solution, and even more so, if there is a way using glm(). Using predict() on lm() with "confidence" or "prediction" doesn't seem to be an option with glm(). See: stackoverflow.com/questions/12544090/…
- IRTFM over 7 years
  
  Using type = link gives you confidence intervals (on the logit scale). You are presenting them on the probability scale but they are still not prediction intervals.
- Bob Hopez over 7 years
  
  @42- The above uses type = link. I then hear you saying the above plot represents the confidence interval. Is there a type argument for prediction intervals in glm? It appears more like a prediction interval to me, given the data--and that it covers ~95% of the data (red dots).
- IRTFM over 7 years
  
  Think about it a bit. A "prediction" for an "Y" value in a binomial situation would need to be either 1 or 0. None of the predict.glm values are either of those numbers.
- IRTFM over 7 years
  
  predict.glm has no provision for interval ="prediction". If you wnat to see more from R-Core guRus consider doing a bit of searching: markmail.org/search/…
- Bob Hopez over 7 years
  
  @ZheyuanLi See bethany's comments below and link to generating the prediction interval. I'm not sure if her suggested method would work for glm binomial problem. Perhaps you're willing to attack the problem using her suggeseted approach.
Bob Hopez over 7 years

Thanks for your approach. I would kindly challenge you to create a plot with both the 'confidence interval' and 'prediction interval' along with the complete code.
sconfluentus over 7 years

Why reinvent the wheel...here is a solid concise and brainy way of doing this with ggplot2:
sconfluentus over 7 years

And those can be used with GLM as well.
Bob Hopez over 7 years

Thanks; link was broken, but found here. I'm not convinced that SE and STDEV calcs used in linear regression can be applied the same way in logistic regression. Challenge still stands. :)
Bob Hopez over 7 years

Will try it... Or if someone, yourself included, wants to post the answer herein; I'll give them an upvote.
sconfluentus over 7 years

I will leave that test to you. I have used this method prior and it works.
Bob Hopez over 7 years

Did you use the binary residuals or the ratio ('probability') residuals? If the sigmoid changes at the beginning or end of the x-range then it will generate lots of skew with either residuals.
Bob Hopez over 7 years

Let us continue this discussion in chat.