Basic lag in R vector/dataframe
Solution 1
Another way to deal with this is using the zoo package, which has a lag method that will pad the result with NA:
require(zoo)
> set.seed(123)
> x <- zoo(sample(c(1:9), 10, replace = T))
> y <- lag(x, -1, na.pad = TRUE)
> cbind(x, y)
x y
1 3 NA
2 8 3
3 4 8
4 8 4
5 9 8
6 1 9
7 5 1
8 9 5
9 5 9
10 5 5
The result is a multivariate zoo object (which is an enhanced matrix), but easily converted to a data.frame via
> data.frame(cbind(x, y))
Solution 2
I had the same problem, but I didn't want to use zoo or xts, so I wrote a simple lag function for data frames:
lagpad <- function(x, k) {
if (k>0) {
return (c(rep(NA, k), x)[1 : length(x)] );
}
else {
return (c(x[(-k+1) : length(x)], rep(NA, -k)));
}
}
This can lag forward or backwards:
x<-1:3;
(cbind(x, lagpad(x, 1), lagpad(x,-1)))
x
[1,] 1 NA 2
[2,] 2 1 3
[3,] 3 2 NA
Solution 3
lag
does not shift the data, it only shifts the "time-base". x
has no "time base", so cbind
does not work as you expected. Try cbind(as.ts(x),lag(x))
and notice that a "lag" of 1 shifts the periods forward.
I would suggesting using zoo
/ xts
for time series. The zoo
vignettes are particularly helpful.
Solution 4
Using just standard R functions this can be achieved in a much simpler way:
x <- sample(c(1:9), 10, replace = T)
y <- c(NA, head(x, -1))
ds <- cbind(x, y)
ds
Solution 5
lag()
works with time series, whereas you are trying to use bare matrices. This old question suggests using embed
instead, like so:
lagmatrix <- function(x,max.lag) embed(c(rep(NA,max.lag), x), max.lag+1)
for instance
> x
[1] 8 2 3 9 8 5 6 8 5 8
> lagmatrix(x, 1)
[,1] [,2]
[1,] 8 NA
[2,] 2 8
[3,] 3 2
[4,] 9 3
[5,] 8 9
[6,] 5 8
[7,] 6 5
[8,] 8 6
[9,] 5 8
[10,] 8 5
Btibert3
New to programming, but trying to learn as much as I can.
Updated on July 09, 2022Comments
-
Btibert3 almost 2 years
Will most likely expose that I am new to R, but in SPSS, running lags is very easy. Obviously this is user error, but what I am missing?
x <- sample(c(1:9), 10, replace = T) y <- lag(x, 1) ds <- cbind(x, y) ds
Results in:
x y [1,] 4 4 [2,] 6 6 [3,] 3 3 [4,] 4 4 [5,] 3 3 [6,] 5 5 [7,] 8 8 [8,] 9 9 [9,] 3 3 [10,] 7 7
I figured I would see:
x y [1,] 4 [2,] 6 4 [3,] 3 6 [4,] 4 3 [5,] 3 4 [6,] 5 3 [7,] 8 5 [8,] 9 8 [9,] 3 9 [10,] 7 3
Any guidance will be much appreciated.
-
zwol over 13 yearsNeither
zoo
norxts
seems to be stock, where do I get them? -
Joshua Ulrich over 13 years
install.packages("xts") # this will install zoo as well
-
G. Grothendieck over 13 yearsAlso note that if z is a zoo series then lag(z, 0:-1) is a two column zoo series with the original series and a lagged series. Also, coredata(z) will return just the data part of a zoo series and as.data.frame(z) will return a data frame with the data part of z as the column contents.
-
Tomas over 12 yearsthis is not correct! Probably you wanted to say
y <- c(NA, head(x, -1))
-
Danielle over 6 yearsLets say I wanted to do this function on a vector but preform it recursively for multiple lags
lagpad(x,-1:-216)
and output that information into one dataframe (e.g. lagpad(x,-1) becomes variable #1 of the df, lagpad(x,-2) becomes variable #2 of the df,lagpad(x,-3) becomes variable #3 of the df...and so on. would I have to cbind 216 columns or is there a shorter way to adapt your code to this scenario? -
TickboxPhil about 5 yearsYes! In any context it seems, just swap dplyr::lag for standard lag and then works fine on non time series... job done!
-
Thrastylon over 3 yearsAm I the only one finding that zoo is getting k backwards? In this example k=-1 is negative so I would expect y to be leading, but it's in fact lagging behind x. The default is k=1 so if I write "y = lag(x)", I end up with y leading x. This is... misleading.
-
G. Grothendieck over 3 yearszoo's design principles include consistency with base R and in base R a positive lag causes the series to start earlier. See ?lag
-
W Barker about 2 years@G.Grothendieck, just came to this post with a similar problem and tried running your accepted solution, but got this error:
Error: "n" must be a nonnegative integer scalar, not an integer vector of length 1.
Changing the-1
to1
eliminates the error, but raises the question as to whether something has changed since you wrote this solution -- of which readers of this post should be aware. Care to comment? Thanks. -
G. Grothendieck about 2 years@W Barker, You likely introduced an error by loading dplyr which clobbers
lag
in the base of R. Uselibrary(dplyr, exclude = c("filter", "lag"))
or don't load dplyr.