converting a data frame to monthly time series

19,534

Solution 1

You could transpose (function t()) the matrix before vectorizing it:

set.seed(12)
dummy.df <- as.data.frame(matrix(round(rnorm(1200), digits = 2),
                                 nrow = 100, ncol = 12))
rownames(dummy.df) <- seq(1901, 2000)
colnames(dummy.df) <- month.abb
dummy.df.ts <- ts(as.vector(t(as.matrix(dummy.df))), 
                  start=c(1901,1), end=c(2000,12), frequency=12)

Solution 2

You could melt the data, order by date, then apply the ts() function.

Here's the data setup. If your language setting is English you could save some code by using month.abb, but that is not robust to other language locales.

set.seed(12)
dummy.df <- as.data.frame(matrix(round(rnorm(1200),digits=2),nrow=100,ncol=12))
months <- format(seq.Date(as.Date("2013-01-01"), as.Date("2013-12-01"), 
                          by = "month"), format = "%b")
colnames(dummy.df) <- months
dummy.df$Year <- seq(1901, 2000) # set as variable, not as rownames 

Melt the data so you have a data frame with 1200 rows, each representing an observation:

library("reshape2")
dummy.df <- melt(dummy.df, id.vars = "Year")

Order the observations by date:

dummy.df$Date <- as.Date(paste(dummy.df$Year, dummy.df$variable, "01", sep = "-"),
                         format = ("%Y-%b-%d"))
dummy.df <- dummy.df[order(dummy.df$Date), ]

Then you can apply a similar ts() call, with the ts object showing the desired order:

dummy.df.ts <- ts(dummy.df$value, start=c(1901,1), end=c(2000,12), frequency=12)
Share:
19,534
avg
Author by

avg

Updated on July 20, 2022

Comments

  • avg
    avg almost 2 years

    I have a data frame of a monthly data for 100 yrs (1200 data points) with the months in columns and years in the rows. I want to convert it into a monthly time series and I have tried several ways, none of which create the correct "temporal" structure.

    The problem lies with R considering the data frame as a 100 observations (years) of 12 variables (the months). Here is a reproducible code for my latest try:

    set.seed(12)
    dummy.df <- as.data.frame(matrix(round(rnorm(1200),digits=2),nrow=100,ncol=12))
    rownames(dummy.df) <- seq(from=1901, to=2000)
    colnames(dummy.df) <- c("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec")
    dummy.df.ts <- ts(as.vector(as.matrix(dummy.df)), start=c(1901,1), end=c(2000,12), frequency=12)
    

    In the "dummy.df.ts" object, the rows and columns are switched and instead of sequential observations in columns, all the januarys februarys etc are stacked together one after the other. How can I get to the correct temporal structure?

    An example of my data: these are monthly temperature values from 1901 - 1905

    fr.monthly.temp.sample  
    
         JAN FEB MAR  APR  MAY  JUN  JUL  AUG  SEP  OCT NOV DEC  
    1901 2.7 0.4 4.7 10.0 13.0 16.9 19.2 18.3 15.7 10.6 4.9 3.5  
    1902 4.1 3.2 7.5 10.3 10.0 15.1 18.2 17.4 15.0 10.2 6.3 3.5  
    1903 3.8 5.9 7.6  7.1 12.9 14.9 17.6 17.3 15.5 12.1 6.9 2.7  
    1904 3.0 4.6 5.5 10.3 13.6 16.3 20.2 18.5 13.9 11.2 5.4 4.8  
    1905 1.7 4.0 7.4  9.3 11.9 16.5 20.0 17.6 14.7  8.4 5.5 3.8  
    

    Plot of original time series

    And by using this ts() call:

    fr.monthly.temp.sample.ts <- ts(as.vector(as.matrix(fr.monthly.temp.sample)),                              start=c(1901,1), end=c(1905,12), frequency=12)
    

    This is the output I get for the time series object:

    fr.monthly.temp.sample.ts  
    
          Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec  
    1901  2.7  4.1  3.8  3.0  1.7  0.4  3.2  5.9  4.6  4.0  4.7  7.5  
    1902  7.6  5.5  7.4 10.0 10.3  7.1 10.3  9.3 13.0 10.0 12.9 13.6  
    1903 11.9 16.9 15.1 14.9 16.3 16.5 19.2 18.2 17.6 20.2 20.0 18.3  
    1904 17.4 17.3 18.5 17.6 15.7 15.0 15.5 13.9 14.7 10.6 10.2 12.1  
    1905 11.2  8.4  4.9  6.3  6.9  5.4  5.5  3.5  3.5  2.7  4.8  3.8  
    

    Plot of time series from R code

    --Note the changed temporal structure (values from the columns are now in the rows..)--

    Thanks.

  • avg
    avg about 11 years
    thanks for the answers. I tried #2 before you had posted #1 and it gave me the correct time series output. But I wanted to know how to do this without using reshape since what that was doing was essentially ordering the values as a column vector i.e stacking the rows from my original df end-to-end. I tried a couple of t() operation earlier without result but I dont think i tried the one you posted in #1..will let you know..