Predict.glm not predicting missing values in response
Solution 1
When glm
fits the model, it uses only the cases where there are no missing values. You can still get predictions for the cases where your y
values are missing, by constructing a data frame and passing that to predict.glm
.
predict(m, newdata=data.frame(y, x))
Solution 2
The issue is with your call to glm
, which has a na.action
argument which is set to na.omit
Therefore these values are omited (and when predict.glm
is called, they are still omitted)
From ?glm
na.action
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.
from ?na.exclude
(which is general NA
action help page)
na.exclude differs from na.omit only in the class of the "na.action" attribute of the result, which is "exclude". This gives different behaviour in functions making use of naresid and napredict: when na.exclude is used the residuals and predictions are padded to the correct length by inserting NAs for cases omitted by na.exclude.
generic_user
Updated on June 14, 2022Comments
-
generic_user almost 2 years
For some reason, when I specify glms (and lm's too, it turns out), R is not predicting missing values of the data. Here is an example:
y = round(runif(50)) y = c(y,rep(NA,50)) x = rnorm(100) m = glm(y~x, family=binomial(link="logit")) p = predict(m,na.action=na.pass) length(p) y = round(runif(50)) y = c(y,rep(NA,50)) x = rnorm(100) m = lm(y~x) p = predict(m) length(p)
The length of p should be 100, but its 50. The weird thing is that I have other predicts in the same script that do predict from missing data.
EDIT: It turns out that those other predicts were quite wrong -- I was doing
imputed.value = rnorm(N,mean.from.predict,var.of.prediction.interval)
. This recycled the mean and sd vectors from the lm predict or glm predict functions whenlength(predict)<N
, which was quite different from what I was seeking.So my question is what about my example code is stopping glm and lm from predicting missing values?
Thanks!
-
generic_user about 11 yearsI am indeed constructing imputations. What I want is $X'\hat\beta$ for the $X$ values where $Y$ is missing. Edit: sorry, is there no latex in this forum? I mean fitted values with prediction intervals for new data. I suppose I could do so manually, but I expected predict to, well, predict. Whether I use them for imputations or whatever should be up to me.
-
generic_user about 11 yearsThanks -- this works, but is odd. I guess the "original" data when newdata is left to default is the model matrix, rather than the variables fed to glm.
-
generic_user about 11 yearsJust to add extra appreciation for this answer: you helped me to find a potentially MASSIVE error in my code. Really grateful.
-
IRTFM about 11 yearsDownvoting for correct advice about software behavior that doesn't meet ones fantasies is childish.
-
generic_user about 11 yearsI downvoted your comment because it wasn't constructive. You make unfounded assumptions in a mildly hostile tone. Whats more, your answer is not in fact an answer, but rather a comment/a request for further information.
-
IRTFM over 8 yearsAnother downvote to this? I suppose leaving this answer ... and it is an answer (since R does NOT impute for missing values in data given to
glm
even with other values for na.action)... will continue to annoy people who have difficulty accepting reality. If you want to impute data then you need to use a package that provides that facility. -
generic_user over 8 yearsTwo years ago when I was learning R, you provided a technically correct, if rude and useless answer. A useful answer would have been to explain that
predict
when applied to a fitted model defaults to the model frame stored in the (g)lm object, which in turn omits observations with missing values. It is a bit dense in fact to assume that anyone would seek imputation from a predict function. If you had actually looked at what I was asking at the time, you would have seen that I wanted predictions ofy
wherex
is observed. Imputation usually refers to efforts to account for missing covs -
IRTFM over 8 years@generic_user: How is it rude to say that R does not return a value from
predict.glm
for cases that have missing values? You are the only one who has suggested that I am "dense". I just suggested that you appeared to have gotten an incorrect idea. You said you had observed something different and I suggested that you needed to provide a demonstration in code. In a technical forum that's not being "rude", it's being accurate.