Repeat rows of a data.frame

179,392

Solution 1

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]

Solution 2

A clean dplyr solution, taken from here

library(dplyr)
df <- tibble(x = 1:2, y = c("a", "b"))
df %>% slice(rep(1:n(), each = 2))

Solution 3

There is a lovely vectorized solution that repeats only certain rows n-times each, possible for example by adding an ntimes column to your data frame:

  A B   C ntimes
1 j i 100      2
2 K P 101      4
3 Z Z 102      1

Method:

df <- data.frame(A=c("j","K","Z"), B=c("i","P","Z"), C=c(100,101,102), ntimes=c(2,4,1))
df <- as.data.frame(lapply(df, rep, df$ntimes))

Result:

  A B   C ntimes
1 Z Z 102      1
2 j i 100      2
3 j i 100      2
4 K P 101      4
5 K P 101      4
6 K P 101      4
7 K P 101      4

This is very similar to Josh O'Brien and Mark Miller's method:

df[rep(seq_len(nrow(df)), df$ntimes),]

However, that method appears quite a bit slower:

df <- data.frame(A=c("j","K","Z"), B=c("i","P","Z"), C=c(100,101,102), ntimes=c(2000,3000,4000))

microbenchmark::microbenchmark(
  df[rep(seq_len(nrow(df)), df$ntimes),],
  as.data.frame(lapply(df, rep, df$ntimes)),
  times = 10
)

Result:

Unit: microseconds
                                      expr      min       lq      mean   median       uq      max neval
   df[rep(seq_len(nrow(df)), df$ntimes), ] 3563.113 3586.873 3683.7790 3613.702 3657.063 4326.757    10
 as.data.frame(lapply(df, rep, df$ntimes))  625.552  654.638  676.4067  668.094  681.929  799.893    10

Solution 4

If you can repeat the whole thing, or subset it first then repeat that, then this similar question may be helpful. Once again:

library(mefa)
rep(mtcars,10) 

or simply

mefa:::rep.data.frame(mtcars)

Solution 5

Adding to what @dardisco mentioned about mefa::rep.data.frame(), it's very flexible.

You can either repeat each row N times:

rep(df, each=N)

or repeat the entire dataframe N times (think: like when you recycle a vectorized argument)

rep(df, times=N)

Two thumbs up for mefa! I had never heard of it until now and I had to write manual code to do this.

Share:
179,392

Related videos on Youtube

Stefan
Author by

Stefan

Updated on July 17, 2022

Comments

  • Stefan
    Stefan almost 2 years

    I want to repeat the rows of a data.frame, each N times. The result should be a new data.frame (with nrow(new.df) == nrow(old.df) * N) keeping the data types of the columns.

    Example for N = 2:

                            A B   C
      A B   C             1 j i 100
    1 j i 100     -->     2 j i 100
    2 K P 101             3 K P 101
                          4 K P 101
    

    So, each row is repeated 2 times and characters remain characters, factors remain factors, numerics remain numerics, ...

    My first attempt used apply: apply(old.df, 2, function(co) rep(co, each = N)), but this one transforms my values to characters and I get:

         A   B   C    
    [1,] "j" "i" "100"
    [2,] "j" "i" "100"
    [3,] "K" "P" "101"
    [4,] "K" "P" "101"
    
  • Mark Miller
    Mark Miller about 10 years
    You can use n.times <- c(2,4) ; df[rep(seq_len(nrow(df)), n.times),] if you want to vary the number of times each line is repeated.
  • smci
    smci about 10 years
    Aha! Another brilliant R function hidden deep inside an obcure specialist package with a totally unrelated name. I love this language!
  • Dan Villarreal
    Dan Villarreal about 4 years
    This is the preferable solution imo because it works cleanly in a pipe.
  • TCS
    TCS almost 3 years
    I think that this is the most versatile solution, as it allows you to assign different number of replications per line! I am curious, is there a way to do this in tidyverse?