Basic lag in R vector/dataframe

83,823

Solution 1

Another way to deal with this is using the zoo package, which has a lag method that will pad the result with NA:

require(zoo)
> set.seed(123)
> x <- zoo(sample(c(1:9), 10, replace = T))
> y <- lag(x, -1, na.pad = TRUE)
> cbind(x, y)
   x  y
1  3 NA
2  8  3
3  4  8
4  8  4
5  9  8
6  1  9
7  5  1
8  9  5
9  5  9
10 5  5

The result is a multivariate zoo object (which is an enhanced matrix), but easily converted to a data.frame via

> data.frame(cbind(x, y))

Solution 2

I had the same problem, but I didn't want to use zoo or xts, so I wrote a simple lag function for data frames:

lagpad <- function(x, k) {
  if (k>0) {
    return (c(rep(NA, k), x)[1 : length(x)] );
  }
  else {
    return (c(x[(-k+1) : length(x)], rep(NA, -k)));
  }
}

This can lag forward or backwards:

x<-1:3;
(cbind(x, lagpad(x, 1), lagpad(x,-1)))
     x      
[1,] 1 NA  2
[2,] 2  1  3
[3,] 3  2 NA

Solution 3

lag does not shift the data, it only shifts the "time-base". x has no "time base", so cbind does not work as you expected. Try cbind(as.ts(x),lag(x)) and notice that a "lag" of 1 shifts the periods forward.

I would suggesting using zoo / xts for time series. The zoo vignettes are particularly helpful.

Solution 4

Using just standard R functions this can be achieved in a much simpler way:

x <- sample(c(1:9), 10, replace = T)
y <- c(NA, head(x, -1))
ds <- cbind(x, y)
ds

Solution 5

lag() works with time series, whereas you are trying to use bare matrices. This old question suggests using embed instead, like so:

lagmatrix <- function(x,max.lag) embed(c(rep(NA,max.lag), x), max.lag+1)

for instance

> x
[1] 8 2 3 9 8 5 6 8 5 8
> lagmatrix(x, 1)
      [,1] [,2]
 [1,]    8   NA
 [2,]    2    8
 [3,]    3    2
 [4,]    9    3
 [5,]    8    9
 [6,]    5    8
 [7,]    6    5
 [8,]    8    6
 [9,]    5    8
[10,]    8    5
Share:
83,823
Btibert3
Author by

Btibert3

New to programming, but trying to learn as much as I can.

Updated on July 09, 2022

Comments

  • Btibert3
    Btibert3 almost 2 years

    Will most likely expose that I am new to R, but in SPSS, running lags is very easy. Obviously this is user error, but what I am missing?

    x <- sample(c(1:9), 10, replace = T)
    y <- lag(x, 1)
    ds <- cbind(x, y)
    ds
    

    Results in:

          x y
     [1,] 4 4
     [2,] 6 6
     [3,] 3 3
     [4,] 4 4
     [5,] 3 3
     [6,] 5 5
     [7,] 8 8
     [8,] 9 9
     [9,] 3 3
    [10,] 7 7
    

    I figured I would see:

         x y
     [1,] 4 
     [2,] 6 4
     [3,] 3 6
     [4,] 4 3
     [5,] 3 4
     [6,] 5 3
     [7,] 8 5
     [8,] 9 8
     [9,] 3 9
    [10,] 7 3
    

    Any guidance will be much appreciated.

  • zwol
    zwol over 13 years
    Neither zoo nor xts seems to be stock, where do I get them?
  • Joshua Ulrich
    Joshua Ulrich over 13 years
    install.packages("xts") # this will install zoo as well
  • G. Grothendieck
    G. Grothendieck over 13 years
    Also note that if z is a zoo series then lag(z, 0:-1) is a two column zoo series with the original series and a lagged series. Also, coredata(z) will return just the data part of a zoo series and as.data.frame(z) will return a data frame with the data part of z as the column contents.
  • Tomas
    Tomas over 12 years
    this is not correct! Probably you wanted to say y <- c(NA, head(x, -1))
  • Danielle
    Danielle over 6 years
    Lets say I wanted to do this function on a vector but preform it recursively for multiple lags lagpad(x,-1:-216) and output that information into one dataframe (e.g. lagpad(x,-1) becomes variable #1 of the df, lagpad(x,-2) becomes variable #2 of the df,lagpad(x,-3) becomes variable #3 of the df...and so on. would I have to cbind 216 columns or is there a shorter way to adapt your code to this scenario?
  • TickboxPhil
    TickboxPhil about 5 years
    Yes! In any context it seems, just swap dplyr::lag for standard lag and then works fine on non time series... job done!
  • Thrastylon
    Thrastylon over 3 years
    Am I the only one finding that zoo is getting k backwards? In this example k=-1 is negative so I would expect y to be leading, but it's in fact lagging behind x. The default is k=1 so if I write "y = lag(x)", I end up with y leading x. This is... misleading.
  • G. Grothendieck
    G. Grothendieck over 3 years
    zoo's design principles include consistency with base R and in base R a positive lag causes the series to start earlier. See ?lag
  • W Barker
    W Barker about 2 years
    @G.Grothendieck, just came to this post with a similar problem and tried running your accepted solution, but got this error: Error: "n" must be a nonnegative integer scalar, not an integer vector of length 1. Changing the -1 to 1 eliminates the error, but raises the question as to whether something has changed since you wrote this solution -- of which readers of this post should be aware. Care to comment? Thanks.
  • G. Grothendieck
    G. Grothendieck about 2 years
    @W Barker, You likely introduced an error by loading dplyr which clobbers lag in the base of R. Use library(dplyr, exclude = c("filter", "lag")) or don't load dplyr.