Error in model.frame.default: variable lengths differ

228,867

Solution 1

Joran suggested to first remove the NAs before running the model. Thus, I removed the NAs, run the model and obtained the residuals. When I updated model2 by inclusion of the lagged residuals, the error message did not appear again.

Remove NAs

df2<-df1[complete.cases(df1),]

Run the main model

model2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5), data=df2, family=poisson)

Obtain residuals

resid2 <- residuals(model2,type="deviance")

Update model2 by including the lag 1 residuals

model2_1 <- update(model2,.~.+ Lag(resid2,1),  na.action=na.omit)

Solution 2

Another thing that can cause this error is creating a model with the centering/scaling standardize function from the arm package -- m <- standardize(lm(y ~ x, data = train))

If you then try predict(m), you get the same error as in this question.

Solution 3

Its simple, just make sure the data type in your columns are the same. For e.g. I faced the same error, that and an another error:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

So, I went back to my excel file or csv file, set a filter on the variable throwing me an error and checked if the distinct datatypes are the same. And... Oh! it had numbers and strings, so I converted numbers to string and it worked just fine for me.

Share:
228,867

Related videos on Youtube

Meso
Author by

Meso

Updated on April 23, 2021

Comments

  • Meso
    Meso about 3 years

    On running a gam model using the mgcv package, I encountered a strange error message which I am unable to understand:

    “Error in model.frame.default(formula = death ~ pm10 + Lag(resid1, 1) + : variable lengths differ (found for 'Lag(resid1, 1)')”.

    The number of observations used in model1 is exactly the same as the length of the deviance residual, thus I think this error is not related to difference in data size or length.

    I found a fairly related error message on the web here, but that post did not receive an adequate answer, so it is not helpful to my problem.

    Reproducible example and data follows:

    library(quantmod)
    library(mgcv) 
    require(dlnm)
    
    df <- chicagoNMMAPS
    df1 <- df[,c("date","dow","death","temp","pm10")] 
    df1$trend<-seq(dim(df1)[1]) ### Create a time trend
    

    Run the model

    model1<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5),
    data=df1, na.action=na.omit, family=poisson)
    

    Obtain deviance residuals

    resid1 <- residuals(model1,type="deviance")
    

    Add a one day lagged deviance to model 1

    model1_1 <- update(model1,.~.+ Lag(resid1,1),  na.action=na.omit)
    
    model1_2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5) + Lag(resid1,1), data=df1, 
    na.action=na.omit, family=poisson)
    

    Both of these models produced the same error message.

    • joran
      joran over 10 years
      (Almost) never think that an error message is flat out lying. That will greatly increase the amount of time you spend debugging it. Note that you've specified na.omit. Perhaps the differing lengths are due to an observation with an NA value being dropped.
    • Meso
      Meso over 10 years
      @joran, the error occurs with or without the "na.omit" option. In fact my initial attempt was without specifying this option
    • joran
      joran over 10 years
      The default (in most cases) is still na.omit. Note that df has 5114 rows and the length of resid1 is only 4863. NA values are indeed being dropped. Try dropping the NA values first. Then your residual vector will match your original data frame.
    • Meso
      Meso over 10 years
      @joran, many thanks for your suggestion. The model runs after I removed all NAs on outcome and predictors.
    • joran
      joran over 10 years
      Feel free to write up what you did as an answer and (after the waiting period) accept it! :) Glad it worked out...