plotting time series in R

11,721

Solution 1

So I think there are a few things going on here that are worth talking through:

first, some example data:

test <- data.frame(End = Sys.Date()+1:5, 
               Start = Sys.Date()+0:4, 
               tck = rep("GOOG",5), 
               EndP= 1:5, 
               StartP= 0:4)

test.sub = subset(test, tck=="GOOG",select = c(End, EndP))

First, note that test and test.sub are both data frames, so calls like test.sub[1] don't really "mean" anything to R.** It's more R-ish to write test.sub[,1] by virtue of consistency with other R structures. If you compare the results of str(test.sub[1]) and str(test.sub[,1]) you'll see that R treats them slightly differently.

You said you typed:

as.ts(test.sub)
plot(test.sub)

I'd guess you have extensive experience with some sort of OO-language; and while R does have some OO flavor to it, it doesn't apply here. Rather than transforming test.sub to something of class ts, this just does the transformation and throws it away, then moves on to plot the data frame you started with. It's an easy fix though:

test.sub.ts <- as.ts(test.sub)
plot(test.sub.ts)

But, this probably isn't what you were looking for either. Rather, R creates a time series that has two variables called "End" (which is the date now coerced to an integer) and "EndP". Funny business like this is part of the reason time series packages like zoo and xts have caught on so I'll detail them instead a little further down.

(Unfortunately, to the best of my understanding, R doesn't keep date stamps with its default ts class, choosing instead to keep start and end dates as well as a frequency. For more general time series work, this is rarely flexible enough)

You could perhaps get what you wanted by typing

plot(test.sub[,1], test.sub[,2]) 

instead of

plot(test.sub[1], test.sub[2])

since the former runs into trouble given that you are passing two sub-data frames instead of two vectors (even though it looks like you would be).*

Anyways, with xts (and similarly for zoo):

library(xts) # You may need to install this
xtemp <- xts(test.sub[,2], test.sub[,1]) # Create the xts object
plot(xtemp) 
# Dispatches a xts plot method which does all sorts of nice time series things

Hope some of this helps and sorry for the inline code that's not identified as such: still getting used to stack overflow.

Michael

**In reality, they access the lists that are used to structure a data frame internally, but that's more a code nuance than something worth relying on.

***The nitty-gritty is that when you pass plot(test.sub[1], test.sub[2]) to R, it dispatches the method plot.data.frame which takes a single data frame and tries to interpret the second data frame as an additional plot parameter which gets misinterpreted somewhere way down the line, giving your error.

Solution 2

The reason that you get the Error about different x and y lengths is immediately apparent if you do a traceback immediately upon raising the error:

> plot(test.sub[1],test.sub[2])
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ
> traceback()
6: stop("'x' and 'y' lengths differ")
5: xy.coords(x, y, xlabel, ylabel, log)
4: plot.default(x1, ...)
3: plot(x1, ...)
2: plot.data.frame(test.sub[1], test.sub[2])
1: plot(test.sub[1], test.sub[2])

The problems in your call are manifold. First, as mentioned by @mweylandt test.sub[1] is a data frame with the single component, not a vector comprised of the contents of the first component of test.sub.

From the traceback, we see that the plot.data.frame method was called. R is quite happy to plot a data frame as long as it has at least two columns. R took you at your word and passed test.sub[1] (as a data.frame) on to plot() - test.sub[2] never gets a look in. test.sub[1] is eventually passed on to xy.coords() which correctly informs you that you have lots of rows for x but 0 rows for y because test.sub[1] only contains a single component.

It would have worked if you'd done plot(test.sub[,1], test.sub[,2], type = "l") or used the formula interface to name the variables plot(V4 ~ V1, data = test.sub, type = "l") as I show in my other Answer.

Solution 3

Surely it is easier to use the formula interface:

> test <- data.frame(End = Sys.Date()+1:5, 
+                Start = Sys.Date()+0:4, 
+                tck = rep("GOOG",5), 
+                EndP= 1:5, 
+                StartP= 0:4)
> 
> test.sub = subset(test, tck=="GOOG",select = c(End, EndP))
> head(test.sub)
         End EndP
1 2011-10-19    1
2 2011-10-20    2
3 2011-10-21    3
4 2011-10-22    4
5 2011-10-23    5
> plot(EndP ~ End, data = test.sub, type = "l")

I work extensively with time series type data and rarely, if ever, have any need for the "ts" class of objects. Packages zoo and xts are very useful, but if all you want to do is plot the data, i) get the date/time information correctly formatted/set-up as a "Date" or "POSIXt" class object, and then ii) just plot it using standard graphics and type = "l" (or type = "b" or type = "o" if you want to see the observation times).

Share:
11,721
itcplpl
Author by

itcplpl

Updated on July 14, 2022

Comments

  • itcplpl
    itcplpl almost 2 years

    I am working with data, 1st two columns are dates, 3rd column is symbol, and 4th and 5th columns are prices. So, I created a subset of the data as follows:

    test.sub<-subset(test,V3=="GOOG",select=c(V1,V4)
    

    and then I try to plot a time series chart using the following

    as.ts(test.sub)
    plot(test.sub)
    

    well, it gives me a scatter plot - not what I was looking for. so, I tried plot(test.sub[1],test.sub[2]) and now I get the following error:

    Error in xy.coords(x, y, xlabel, ylabel, log) : 
      'x' and 'y' lengths differ
    

    To make sure the no. of rows were same, I ran nrow(test.sub[1]) and nrow(test.sub[2]) and they both return equal rows, so as a newcomer to R, I am not sure what the fix is.

    I also ran plot.ts(test.sub) and that works, but it doesn't show me the dates in the x-axis, which it was doing with plot(test.sub) and which is what I would like to see.

    test.sub[1]
                  V1
    1107 2011-Aug-24
    1206 2011-Aug-25
    1307 2011-Aug-26
    1408 2011-Aug-29
    1510 2011-Aug-30
    1613 2011-Aug-31
    1718 2011-Sep-01
    1823 2011-Sep-02
    1929 2011-Sep-06
    2035 2011-Sep-07
    2143 2011-Sep-08
    2251 2011-Sep-09
    2359 2011-Sep-13
    2470 2011-Sep-14
    2581 2011-Sep-15
    2692 2011-Sep-16
    2785 2011-Sep-19
    2869 2011-Sep-20
    2965 2011-Sep-21
    3062 2011-Sep-22
    3160 2011-Sep-23
    3258 2011-Sep-26
    3356 2011-Sep-27
    3455 2011-Sep-28
    3555 2011-Sep-29
    3655 2011-Sep-30
    3755 2011-Oct-03
    3856 2011-Oct-04
    3957 2011-Oct-05
    4059 2011-Oct-06
    4164 2011-Oct-07
    4269 2011-Oct-10
    4374 2011-Oct-11
    4479 2011-Oct-12
    4584 2011-Oct-13
    4689 2011-Oct-14
    
    str(test.sub)
    'data.frame':   35 obs. of  2 variables:
     $ V1:Class 'Date'  num [1:35] NA NA NA NA NA NA NA NA NA NA ...
     $ V4: num  0.475 0.452 0.423 0.418 0.403 ...
    
    head(test.sub) V1 V4 
    1212 <NA> 0.474697 
    1313 <NA> 0.451907 
    1414 <NA> 0.423184 
    1516 <NA> 0.417709 
    1620 <NA> 0.402966 
    1725 <NA> 0.414264 
    

    Now that this is working, I'd like to add a 3rd variable to plot a 3d chart - any suggestions how I can do that. thx!

  • itcplpl
    itcplpl over 12 years
    thanks for the explanation - indeed very helpful. ran into a glitch with xts. I ran the following xtemp<-xts(test.sub[,2],test.sub[,1]) Error in xts(test.sub[, 2], test.sub[, 1]) : order.by requires an appropriate time-based object I checked test.sub[1] and it shows the dates in format 'yyyy-mmm-dd' so it is a time-based object...have I missed something
  • mweylandt
    mweylandt over 12 years
    If it shows the dates as "yyyy-mm-dd" it's not necessarily a time based object: depending on your data source it may just be a character that to you is obviously a date, but R doesn't know that. A Date is a special data type to R...Try wrapping test.sub[,1] with as.Date() which takes an optional format= argument if your not following the standard. For you, it sounds like as.Date(test.sub[,1], format = "YYYY-mm-dd") will work.
  • itcplpl
    itcplpl over 12 years
    tried that, but no luck, here's what it returns - as.Date(test.sub[,1],format="YYYY-mm-dd") [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [26] NA NA NA NA NA NA NA NA NA NA NA. this is a sample of my data in test.sub 4689 2011-Oct-14 0.2460010 7.18000 1.000000 with the date being V1
  • joran
    joran over 12 years
    @itcplpl I believe mweylandt just gave you the wrong format for the format argument (irony!). Try format = '%Y-%m-%d' instead.
  • itcplpl
    itcplpl over 12 years
    that returned NA's as well :-( this is what I got as.Date(test.sub[,1],format='%Y-%m-%d') [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [26] NA NA NA NA NA NA NA NA NA NA NA
  • itcplpl
    itcplpl over 12 years
    thanks for the example, that is helpful. my issue right now is with getting the Date class object to work correctly. I can't use Sys.Date() as I am dealing with historical data. I posted the error I am getting with Date..a suggestion on the fix would be very helpful
  • Gavin Simpson
    Gavin Simpson over 12 years
    @itcplpl How about you show us what your date data look like? If you show me the format I'll show you how to convert that into something R can read.
  • itcplpl
    itcplpl over 12 years
    sounds good. I just updated the original post with the date data
  • Gavin Simpson
    Gavin Simpson over 12 years
    @itcplpl test.sub <- within(test.sub, V1 <- as.Date(V1, format = "Y%-%b-%d")) should do it. See ?strftime for details of the format codes.
  • itcplpl
    itcplpl over 12 years
    thanks for the pointer on strftime. that bit worked but when I ran the plot, it gives me an error...here's what I ran test.sub<-within(test.sub, V1<-as.Date(V1, format = "%Y-%b-%d")) > xtemp<-xts(test.sub[,2],test.sub[,1]) > plot(xtemp) Error in if (on == "years") { : missing value where TRUE/FALSE needed
  • itcplpl
    itcplpl over 12 years
    I ran head(test.sub) and I get: head(test.sub) V1 V4 1212 <NA> 0.474697 1313 <NA> 0.451907 1414 <NA> 0.423184 1516 <NA> 0.417709 1620 <NA> 0.402966 1725 <NA> 0.414264
  • Gavin Simpson
    Gavin Simpson over 12 years
    @itcplpl I didn't tell you to use xts. Try head(test.sub) and str(test.sub) to see if the dates are encode correctly. It works for me with the dummy data I used above, converted to an xts object and plotted.
  • Gavin Simpson
    Gavin Simpson over 12 years
    @itcplpl Please stop posting R output here - it is unreadable - add it to your Q as you did your data. Basically your test.sub doesn't look like it has any valid dates in it and you have <NA> so I suspect a factor is involved here.
  • mweylandt
    mweylandt over 12 years
    So it took me a bit to figure this out, but it looks like this is the right syntax: sorry, I missed that you had your month recorded as a character: as.Date("2011-Oct-18", format = "%Y-%b-%d") Don't ask me why %b is the correct code, it just seems to work...
  • itcplpl
    itcplpl over 12 years
    ok, I just cleaned up everything and re-ran all the commands and now it works...the date worked. I don't understand what the error was but I am glad it's working. so, for the graph, now I am using plot(test.sub,type="l"). thanks a bunch for your help :-). Since you work extensively with time series type data, would you have any suggestions for plotting 3d graphs - time vs. price1 vs. price2
  • Gavin Simpson
    Gavin Simpson over 12 years
    @itcplpl I would plot the data using trellising or faceting as per lattice or ggplot graphics. If I wanted and price1 and price2 had similar comparable ranges, I would plot both series on the same plot. For that, issue plot.new() between plot calls and you need to draw separate axes by hand via the axis() function. If you need more help, search on SO and ask a Q if there is nothing there already.
  • itcplpl
    itcplpl over 12 years
    ok, will do so. I was looking up lattice but I think it needs me to format my data differently....I'll check on it and ask the Q on SO
  • Dave X
    Dave X over 10 years
    The as.Date() function uses the same format characters that strptime() does, so ?strptime will help for interpreting '%Y-%b-%d' or adapting to other formats.