R - error "variable lengths differ"

41,355

This happens because in your first step you created a separate variable outside of your data frame, transLOT<-log(LengthofTimemin). When you remove a row from the data, transLOT is unchanged. Even worse than differing lengths, your data doesn't line up any more - if the different lengths were ignored, your rows of data would be "off by one" compared to the response after the row you removed.

The simple solution is to create your transLOT variable in the data frame. Then, whenever you do things to the data (like remove rows), the same thing is done to transLOT.

resdata$transLOT <- log(resdata$LengthofTimemin)

Note that I also use the resdata$LengthofTimemin rather than LengthofTimemin which you seem to have in your workspace. Did you use attach() at some point? You shouldn't use attach for exactly this reason. Keep variables in the data frame!

Share:
41,355

Related videos on Youtube

Author by

Kelsey Spencer

Updated on January 19, 2020

Comments

  • Kelsey Spencer almost 3 years
    > #transforming length of time
    > transLOT<-log(LengthofTimemin)
    > 
    > #checking for outliers
    > fit<-lm(transLOT~DielEnd+TideEnd+TideStart+Moonphase+TideStart*Moonphase, data=resdata)
    > outlierTest(fit)
        rstudent unadjusted p-value Bonferonni p
    295 4.445284         1.1025e-05    0.0052808
    > 
    > #getting rid of the outlier data in row 295
    > rdata<-resdata[-295, ]
    > print(rdata[294:296,5:10])
    # A tibble: 3 × 6
      DepartureDate       DepartureTime        LengthofTime LengthofTimemin EventLengthCategories
             <dttm>              <dttm>              <dttm>           <dbl>                 <chr>
    1    2016-09-19 1899-12-30 23:46:46 1899-12-30 00:05:49        5.816667                  5-15
    2    2016-09-20 1899-12-30 01:55:28 1899-12-30 00:01:20        1.333333                    <5
    3    2016-09-20 1899-12-30 04:07:28 1899-12-30 00:01:21        1.350000                    <5
    > newfit<-lm(transLOT~DielEnd+TideEnd+TideStart+Moonphase+TideStart*Moonphase, na.action=na.exclude, data=rdata)
    Error in model.frame.default(formula = transLOT ~ DielEnd + TideEnd +  : 
      variable lengths differ (found for 'DielEnd')
    > #now all of a sudden the variable lengths differ
    

    I understand that the problem occurs with the removal of the row of data but I assumed that na.exclude would account for it. After thoroughly searching, I am unable to determine why this error is occurring.

  • Kelsey Spencer almost 6 years
    Thank you very much! I did indeed use attach() earlier in the code. I had previously not been instructed against it, but from now on I will use $ to link my data frame.
  • Gregor Thomas
    Gregor Thomas almost 6 years
    ther good options exist - you seem to be using dplyr already (at least your rdata is a tibble), so mutate is a nice alternative that both keeps data in your data frame and saves you from having to re-type the name of the data frame hundreds of times.

Related