Here we go again: append an element to a list in R
Solution 1
Adding elements to a list is very slow when doing it one element at a time. See these two examples:
I'm keeping the Result
variable in the global environment to avoid copies to evaluation frames and telling R where to look for it with .GlobalEnv$
, to avoid a blind search with <<-
:
Result <- list()
AddItemNaive <- function(item)
{
.GlobalEnv$Result[[length(.GlobalEnv$Result)+1]] <- item
}
system.time(for(i in seq_len(2e4)) AddItemNaive(i))
# user system elapsed
# 15.60 0.00 15.61
Slow. Now let's try the second approach:
Result <- list()
AddItemNaive2 <- function(item)
{
.GlobalEnv$Result <- c(.GlobalEnv$Result, item)
}
system.time(for(i in seq_len(2e4)) AddItemNaive2(i))
# user system elapsed
# 13.85 0.00 13.89
Still slow.
Now let's try using an environment
, and creating new variables within this environment instead of adding elements to a list. The issue here is that variables must be named, so I'll use the counter as a string to name each item "slot":
Counter <- 0
Result <- new.env()
AddItemEnvir <- function(item)
{
.GlobalEnv$Counter <- .GlobalEnv$Counter + 1
.GlobalEnv$Result[[as.character(.GlobalEnv$Counter)]] <- item
}
system.time(for(i in seq_len(2e4)) AddItemEnvir(i))
# user system elapsed
# 0.36 0.00 0.38
Whoa much faster. :-) It may be a little awkward to work with, but it works.
A final approach uses a list, but instead of augmenting its size one element at a time, it doubles the size each time the list is full. The list size is also kept in a dedicated variable, to avoid any slowdown using length
:
Counter <- 0
Result <- list(NULL)
Size <- 1
AddItemDoubling <- function(item)
{
if( .GlobalEnv$Counter == .GlobalEnv$Size )
{
length(.GlobalEnv$Result) <- .GlobalEnv$Size <- .GlobalEnv$Size * 2
}
.GlobalEnv$Counter <- .GlobalEnv$Counter + 1
.GlobalEnv$Result[[.GlobalEnv$Counter]] <- item
}
system.time(for(i in seq_len(2e4)) AddItemDoubling(i))
# user system elapsed
# 0.22 0.00 0.22
It's even faster. And as easy to a work as any list.
Let's try these last two solutions with more iterations:
Counter <- 0
Result <- new.env()
system.time(for(i in seq_len(1e5)) AddItemEnvir(i))
# user system elapsed
# 27.72 0.06 27.83
Counter <- 0
Result <- list(NULL)
Size <- 1
system.time(for(i in seq_len(1e5)) AddItemDoubling(i))
# user system elapsed
# 9.26 0.00 9.32
Well, the last one is definetely the way to go.
Solution 2
It's very easy. You just need to add it in the following way :
list1$bar <- bar
Solution 3
Operations that change the length of a list/vector in R always copy all the elements into a new list, and so will be slow, O(n). Storing in an environment is O(1) but has a higher constant overhead. For an actual O(1) append and benchmark comparison of a number of approaches see my answer to the other question at https://stackoverflow.com/a/32870310/264177.
user443854
Updated on July 09, 2022Comments
-
user443854 almost 2 years
I am not happy with the accepted answer to Append an object to a list in R in amortized constant time?
> list1 <- list("foo", pi) > bar <- list("A", "B")
How can I append new element
bar
tolist1
? Clearly,c()
does not work, it flattensbar
:> c(list1, bar) [[1]] [1] "foo" [[2]] [1] 3.141593 [[3]] [1] "A" [[4]] [1] "B"
Assignment to index works:
> list1[[length(list1)+1]] <- bar > list1 [[1]] [1] "foo" [[2]] [1] 3.141593 [[3]] [[3]][[1]] [1] "A" [[3]][[2]] [1] "B"
What is the efficiency of this method? Is there a more elegant way?
-
user443854 almost 11 yearsI tried the first approach with
[[]]
with different number of elements:2e3
runs 100x faster than2e4
, clearly O(N^2), so the whole list is being copied. On the other hand, assigning to different variable name each time takes about 20x time for2e5
elements compared to2e4
elements, which is O(N) -- performance that what I would expect from adding an element to a list. -
Richard over 8 yearsThis does not address the question of efficiency at all.
-
Megatron over 8 yearsRemoving
.GlobalEnv$
makes the function faster -
Ferdinand.kraft over 8 years@Megatron, that breaks the code, see my comment to your answer.
-
JanKanis about 8 yearsAnd by O(1) for storing an item in an environment, I mean amortized O(1) (i.e. on average), but for most cases that is good enough.
-
SKG almost 8 yearsI like the doubling size method, but I always end up with a list longer than what I want. How do i remove the NULL entries at the tail of the list:
-
d8aninja about 7 yearsYeah but it's
very easy...
@Richard -
U. Windl over 6 yearsR using a functional programming style will always have problems when adding an element to an existing structure, like in
v <- c(v, newElement)
. Adding or replacing elements via index is not functional programming style, however. I don't know the internal representation of vectors in R, but using a chunked array (individual smaller arrays being linked in a chain to form a larger array basically), parts not affected by the append could be reused unmodified, removing load from the GC. After all, the base of R is not designed very well, making it quite hard to do simple things, sometimes. -
JanKanis over 5 yearsR vectors are basically plain fixed size C arrays. If something changes the length R just creates a new copy of the entire array. That works great for doing analyses on large data sets but not so well for appending elements one by one.