Here we go again: append an element to a list in R

77,529

Solution 1

Adding elements to a list is very slow when doing it one element at a time. See these two examples:

I'm keeping the Result variable in the global environment to avoid copies to evaluation frames and telling R where to look for it with .GlobalEnv$, to avoid a blind search with <<-:

Result <- list()

AddItemNaive <- function(item)
{
    .GlobalEnv$Result[[length(.GlobalEnv$Result)+1]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemNaive(i))
#   user  system elapsed 
#  15.60    0.00   15.61 

Slow. Now let's try the second approach:

Result <- list()

AddItemNaive2 <- function(item)
{
    .GlobalEnv$Result <- c(.GlobalEnv$Result, item)
}

system.time(for(i in seq_len(2e4)) AddItemNaive2(i))
#   user  system elapsed 
#  13.85    0.00   13.89

Still slow.

Now let's try using an environment, and creating new variables within this environment instead of adding elements to a list. The issue here is that variables must be named, so I'll use the counter as a string to name each item "slot":

Counter <- 0
Result <- new.env()

AddItemEnvir <- function(item)
{
    .GlobalEnv$Counter <- .GlobalEnv$Counter + 1

    .GlobalEnv$Result[[as.character(.GlobalEnv$Counter)]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemEnvir(i))
#   user  system elapsed 
#   0.36    0.00    0.38 

Whoa much faster. :-) It may be a little awkward to work with, but it works.

A final approach uses a list, but instead of augmenting its size one element at a time, it doubles the size each time the list is full. The list size is also kept in a dedicated variable, to avoid any slowdown using length:

Counter <- 0
Result <- list(NULL)
Size <- 1

AddItemDoubling <- function(item)
{
    if( .GlobalEnv$Counter == .GlobalEnv$Size )
    {
        length(.GlobalEnv$Result) <- .GlobalEnv$Size <- .GlobalEnv$Size * 2
    }

    .GlobalEnv$Counter <- .GlobalEnv$Counter + 1

    .GlobalEnv$Result[[.GlobalEnv$Counter]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemDoubling(i))
#   user  system elapsed 
#   0.22    0.00    0.22

It's even faster. And as easy to a work as any list.

Let's try these last two solutions with more iterations:

Counter <- 0
Result <- new.env()

system.time(for(i in seq_len(1e5)) AddItemEnvir(i))
#   user  system elapsed 
#  27.72    0.06   27.83 


Counter <- 0
Result <- list(NULL)
Size <- 1

system.time(for(i in seq_len(1e5)) AddItemDoubling(i))
#   user  system elapsed 
#   9.26    0.00    9.32

Well, the last one is definetely the way to go.

Solution 2

It's very easy. You just need to add it in the following way :

list1$bar <- bar

Solution 3

Operations that change the length of a list/vector in R always copy all the elements into a new list, and so will be slow, O(n). Storing in an environment is O(1) but has a higher constant overhead. For an actual O(1) append and benchmark comparison of a number of approaches see my answer to the other question at https://stackoverflow.com/a/32870310/264177.

Share:
77,529
user443854
Author by

user443854

Updated on July 09, 2022

Comments

  • user443854
    user443854 almost 2 years

    I am not happy with the accepted answer to Append an object to a list in R in amortized constant time?

    > list1 <- list("foo", pi)
    > bar <- list("A", "B")
    

    How can I append new element bar to list1? Clearly, c() does not work, it flattens bar:

    > c(list1, bar)
    [[1]]
    [1] "foo"
    
    [[2]]
    [1] 3.141593
    
    [[3]]
    [1] "A"
    
    [[4]]
    [1] "B"
    

    Assignment to index works:

    > list1[[length(list1)+1]] <- bar
    > list1
    [[1]]
    [1] "foo"
    
    [[2]]
    [1] 3.141593
    
    [[3]]
    [[3]][[1]]
    [1] "A"
    
    [[3]][[2]]
    [1] "B"
    

    What is the efficiency of this method? Is there a more elegant way?

  • user443854
    user443854 almost 11 years
    I tried the first approach with [[]] with different number of elements: 2e3 runs 100x faster than 2e4, clearly O(N^2), so the whole list is being copied. On the other hand, assigning to different variable name each time takes about 20x time for 2e5 elements compared to 2e4 elements, which is O(N) -- performance that what I would expect from adding an element to a list.
  • Richard
    Richard over 8 years
    This does not address the question of efficiency at all.
  • Megatron
    Megatron over 8 years
    Removing .GlobalEnv$ makes the function faster
  • Ferdinand.kraft
    Ferdinand.kraft over 8 years
    @Megatron, that breaks the code, see my comment to your answer.
  • JanKanis
    JanKanis about 8 years
    And by O(1) for storing an item in an environment, I mean amortized O(1) (i.e. on average), but for most cases that is good enough.
  • SKG
    SKG almost 8 years
    I like the doubling size method, but I always end up with a list longer than what I want. How do i remove the NULL entries at the tail of the list:
  • d8aninja
    d8aninja about 7 years
    Yeah but it's very easy... @Richard
  • U. Windl
    U. Windl over 6 years
    R using a functional programming style will always have problems when adding an element to an existing structure, like in v <- c(v, newElement). Adding or replacing elements via index is not functional programming style, however. I don't know the internal representation of vectors in R, but using a chunked array (individual smaller arrays being linked in a chain to form a larger array basically), parts not affected by the append could be reused unmodified, removing load from the GC. After all, the base of R is not designed very well, making it quite hard to do simple things, sometimes.
  • JanKanis
    JanKanis over 5 years
    R vectors are basically plain fixed size C arrays. If something changes the length R just creates a new copy of the entire array. That works great for doing analyses on large data sets but not so well for appending elements one by one.