What does lag function in R do?

15,992

Solution 1

plain vectors lag is a generic which means it can act differently on objects of different classes. Here we will only discuss how it works with a plain vector but in the last two sections we will also discuss "ts", "zoo" (and "zooreg") class objects and how they are lagged. As an example, we use this vector of values:

x <- c(11, 12, 13, 14)

tsp Realize that a time series is a sequence of times and the values at those times. Here we only have the values but not the times so lag conceptually adds regularly spaced default times of 1, 2, 3, 4 by adding a tsp attribute which is a triple that encodes the start time, the end time and the frequency (i.e. the reciprocal of the distance between successive times). We can encode the times 1, 2, 3, 4 as the tsp attribute c(1, 4, 1). 1 is the start time. 4 is the end time. The time points are all 1 apart (because the time differences 2-1, 3-2 and 4-3 each equal 1) and 1/1 = 1 so the frequency is 1. A quarterly series whose times are measured in years would have a frequency of 4 since each successive quarter would be 0.25 apart and 1/0.25 = 4. Similary, a monthly series measured in years would have a frequency of 12.

lag lag shifts the times one back. It does not change the values, only the times. Thus lag changes the tsp attribute from c(1, 4, 1) to c(0, 3, 1). The start time is shifted from 1 to 0, the end time is shifted from 4 to 3 and since shifts do not change the frequency the frequency remains 1.

> lag(x)
[1] 11 12 13 14
attr(,"tsp")
[1] 0 3 1

time The time function will produce an object whose values are the times of its argument and whose tsp attribute is the same as the tsp attribute of its argument (or the default tsp attribute if none). For example, as we already discussed the code below shows that the times of the plain vector x given above are 1, 2, 3, 4 and the times for lag(x) are 0, 1, 2, 3.

> time(x)
[1] 1 2 3 4
attr(,"tsp")
[1] 1 4 1
> time(lag(x))
[1] 0 1 2 3
attr(,"tsp")
[1] 0 3 1

ts Most operations on plain vectors ignore the tsp attribute so unless you do something with it its existence may be pointless. On the other hand, if the object were a "ts" class object then the various operations on "ts" objects do pay attention to the tsp attribute. For example, note where these plots start:

# plain vector
plot(x) # plot starts at time = 1
plot(lag(x)) # same, tsp was ignored

# ts object
plot(ts(x)) # plot starts at time = 1
plot(lag(ts(x))) # plot starts at time = 0, tsp was not ignored

zoo The series above was regularly spaced, i.e. the differences between successive times were the same. To represent irregularly spaced series one can use the "zoo" and "zooreg" classes in the zoo package. A zoo object is the values with an index attribute holding the times. The times are not encoded in a tsp attribute. For example, here we see that the zoo objects has times 1, 2, 3, 4 held and values 11, 12, 13, 14:

> library(zoo)
>
> str(zoo(x))
‘zoo’ series from 1 to 4
  Data: num [1:4] 11 12 13 14
  Index:  int [1:4] 1 2 3 4

The "zooreg" class is like "zoo" for objects which are regularly spaced except for some times that may be omitted. Internally "zooreg" objects are the same as "zoo" objects except the frequency is also stored. The values and index are the same as for zoo but we know have a frequency as well. Since the successive time points are 1 apart the frequency is 1.

> str(zooreg(x))
‘zooreg’ series from 1 to 4
  Data: num [1:4] 11 12 13 14
  Index:  num [1:4] 1 2 3 4
  Frequency: 1 

If one lag a "zoo" object then each time is moved to the prior time and the first time dropped. Here the times are 1, 2, 3 and the values are 12, 13, 14. Note that the lagged series has a subset of the times of the original series. That is always the case when lagging a zoo series:

> lag(zoo(x))
 1  2  3 
12 13 14 

Because "zooreg" objects have a frequency they can be lagged to times that did not exist in the original series. Each time point t is lagged to t - deltat where deltat is 1/frequency. Here 0, 1, 2, 3 are the lagged time points and the values are 11, 12, 13, 14:

> lag(zooreg(x))
 0  1  2  3 
11 12 13 14 

dplyr The dplyr package has a lag function. Unfortunately it acts in the opposite direction of the base R lag function in that lag(x, k) moves each item in the series forward rather than backwards. This may actually be more intuitive but causes a lot of confusion due to the incompatibility with base R. If you use dplyr be very careful that you know whether dplyr is loaded or not.

dplyr's lag is particularly useful when used with data frames since given a vector (such as a column of a data frame) it always returns a vector of the same length. It has a default= argument which itself defaults to NA but can be specified by the user to determine what the empty value(s) at the beginning of the vector are to be filled in with. Negative lags are not allowed but the dplyr lead function can be used.

dplyr::lag(1:5)
## [1] NA  1  2  3  4

dplyr::lag(1:5, 2)
## [1] NA NA  1  2  3

dplyr::lead(1:5)
## [1]  2  3  4  5 NA

Solution 2

lag takes an atomic vector and returns that same vector with an added attribute of three numbers indicating the start, end, and frequency of a lag of length one on the vector you supplied. Vectors in R are indexed from 1, so the start value for a lag of length one is 0 and, in your case, the end is 3 (one short of the length of the vector). Finally, frequency specifies how many elements should be attributed to each index.

Example (courtesy of @GavinSimpson)

x <- ts(c(0,0,0,1))
x
lag(x)

> x
Time Series:
Start = 1 
End = 4 
Frequency = 1 
[1] 0 0 0 1
> lag(x)
Time Series:
Start = 0 
End = 3 
Frequency = 1 
[1] 0 0 0 1

Note how the vector ([1] 0 0 0 1) is unchanged but the Start and End properties of the time series are modified as one would expect for a lag. For this to be of use, you need a function that understands R's ts objects. If using something else, you may need to lag the vector yourself.

Share:
15,992
Anoop
Author by

Anoop

Updated on June 27, 2022

Comments

  • Anoop
    Anoop almost 2 years

    I am debugging R code and I am pretty much confused about how lag function in R works. For example

    > x=c(0)
    > x
    [1] 0
    > lag(x)
    [1] 0
    attr(,"tsp")
    [1] 0 0 1
    

    Another example

    > x=c(0,0,0,1)
    > x
    [1] 0 0 0 1
    > lag(x)
    [1] 0 0 0 1
    attr(,"tsp")
    [1] 0 3 1
    

    Can someone explain me what exactly lag function does in plain english.

    I am specifically concerned about how the return values are computed.

    Keep in mind this question is from a programmer trying to learn R rather than a statistician.

  • Joshua Ulrich
    Joshua Ulrich over 9 years
    You're describing the tsp attribute of the returned object, not the returned object itself.
  • IRTFM
    IRTFM over 9 years
    The area that confused the heck out of me for several years and made me never want to use R ts objects again was the fact that lag doesn't really change the vector, but rather changes the time base. That's not what I expected.
  • Gavin Simpson
    Gavin Simpson over 9 years
    Perhaps it would be easier for the OP to understand what is going on if you make x from their example a ts object to compare with the one that lag creates. Then it is clear that the vector is unchanged and just the time unit t to which each observation is linked is modified?