Error in model.frame.default: variable lengths differ
Solution 1
Joran suggested to first remove the NAs before running the model. Thus, I removed the NAs, run the model and obtained the residuals. When I updated model2 by inclusion of the lagged residuals, the error message did not appear again.
Remove NAs
df2<-df1[complete.cases(df1),]
Run the main model
model2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5), data=df2, family=poisson)
Obtain residuals
resid2 <- residuals(model2,type="deviance")
Update model2 by including the lag 1 residuals
model2_1 <- update(model2,.~.+ Lag(resid2,1), na.action=na.omit)
Solution 2
Another thing that can cause this error is creating a model with the centering/scaling standardize function from the arm package -- m <- standardize(lm(y ~ x, data = train))
If you then try predict(m)
, you get the same error as in this question.
Solution 3
Its simple, just make sure the data type in your columns are the same. For e.g. I faced the same error, that and an another error:
Error in
contrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
So, I went back to my excel file or csv file, set a filter on the variable throwing me an error and checked if the distinct datatypes are the same. And... Oh! it had numbers and strings, so I converted numbers to string and it worked just fine for me.
Related videos on Youtube
Meso
Updated on April 23, 2021Comments
-
Meso about 3 years
On running a gam model using the mgcv package, I encountered a strange error message which I am unable to understand:
“Error in model.frame.default(formula = death ~ pm10 + Lag(resid1, 1) + : variable lengths differ (found for 'Lag(resid1, 1)')”.
The number of observations used in model1 is exactly the same as the length of the deviance residual, thus I think this error is not related to difference in data size or length.
I found a fairly related error message on the web here, but that post did not receive an adequate answer, so it is not helpful to my problem.
Reproducible example and data follows:
library(quantmod) library(mgcv) require(dlnm) df <- chicagoNMMAPS df1 <- df[,c("date","dow","death","temp","pm10")] df1$trend<-seq(dim(df1)[1]) ### Create a time trend
Run the model
model1<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5), data=df1, na.action=na.omit, family=poisson)
Obtain deviance residuals
resid1 <- residuals(model1,type="deviance")
Add a one day lagged deviance to model 1
model1_1 <- update(model1,.~.+ Lag(resid1,1), na.action=na.omit) model1_2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5) + Lag(resid1,1), data=df1, na.action=na.omit, family=poisson)
Both of these models produced the same error message.
-
joran over 10 years(Almost) never think that an error message is flat out lying. That will greatly increase the amount of time you spend debugging it. Note that you've specified
na.omit
. Perhaps the differing lengths are due to an observation with an NA value being dropped. -
Meso over 10 years@joran, the error occurs with or without the "na.omit" option. In fact my initial attempt was without specifying this option
-
joran over 10 yearsThe default (in most cases) is still
na.omit
. Note thatdf
has 5114 rows and the length ofresid1
is only 4863. NA values are indeed being dropped. Try dropping the NA values first. Then your residual vector will match your original data frame. -
Meso over 10 years@joran, many thanks for your suggestion. The model runs after I removed all NAs on outcome and predictors.
-
joran over 10 yearsFeel free to write up what you did as an answer and (after the waiting period) accept it! :) Glad it worked out...
-