R: sum the elements of each list on a list and return the result in a data frame

14,188

Actually just figured it out on my own. The answer is:

data.frame(unlist(lapply(matches, function(x) sum(x))))

the first part yiels a list of lists with one element each, the summation of the elements of each list

> lapply(matches, function(x) sum(x))
[[1]]
[1] 24

[[2]]
[1] 152

[[3]]
[1] 24

[[4]]
[1] 64

[[5]]
[1] 21

[[6]]
[1] 0

the second part generates a vector from that. Evidently it is a recursive function:

> unlist(lapply(matches, function(x) sum(x)))
[1]  24 152  24  64  21   0

Finally it is transformed into a dataframe using the data.frame() function.

Share:
14,188
s_a
Author by

s_a

Updated on June 14, 2022

Comments

  • s_a
    s_a almost 2 years

    I have a list of lists in R; each list has the results of a Grep command indicating the position where the search string was found. The command

    > matches<-lapply(ListOfFiles, function(x)
    + grep("SearchString",readLines(x),ignore.case = T))
    

    produces

    //I copy the results that the function actually yields for the sake of the example
    
    > matches<-list(c(11L, 13L), c(9L, 12L, 14L, 15L, 16L, 19L, 20L, 22L, 25L
    + ), c(5L, 8L, 11L), c(10L, 11L, 13L, 14L, 16L), c(5L, 7L, 9L), 
    + integer(0))
    
    > matches
    [[1]]
    [1] 11 13
    
    [[2]]
    [1]  9 12 14 15 16 19 20 22 25
    
    [[3]]
    [1]  5  8 11
    
    [[4]]
    [1] 10 11 13 14 16
    
    [[5]]
    [1] 5 7 9
    
    [[6]]
    integer(0)
    

    I need to transform this to a simple data frame of 6 rows and 1 column, with each "cell" having the sum of each of the 6 lists of matches.

    If at all possible, please try to explain the syntax I should employ; I'm new to R and sometimes I find examples difficult to follow if several things are nested at once.

  • josliber
    josliber over 9 years
    Simpler still: sapply(matches, sum)
  • Señor O
    Señor O over 9 years
    Since you only have on argument to x you can just do: data.frame(sapply(matches, sum))
  • s_a
    s_a over 9 years
    O_O Thanks to you both!! Yesterday I spent so much time with this and now I just saw it. Still, thanks for pointing me to sapply, really useful.
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 9 years
    @josilber, but sapply is likely to be slower than unlist(lapply(matches, sum)). Better than sapply would be vapply(matches, sum, numeric(1L)).
  • s_a
    s_a over 9 years
    In which order would it be relevant, 100,000 records? More? Less? (Just for curiosity; in my application I'm grepping 100MB worth of .txt files and that will dwarf any other inefficiency in the code.)
  • josliber
    josliber over 9 years
    @AnandaMahto true, though for most lists the time needed to type the extra characters in your expressions far exceeds the runtime advantages =)
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 9 years
    @josilber, then try rapply(matches, sum) :-)
  • s_a
    s_a over 9 years
    Aren't rapply and sapply the same in this case?
  • josliber
    josliber over 9 years
    @AnandaMahto on my benchmark I just pulled together (100000 list elements of sizes 1-10) rapply is no faster than sapply, and vapply and unlist+lapply are 3.5x faster (6.5 microseconds vs. 21 microseconds).
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 9 years
    @josilber, the comment wasn't meant to be taken too seriously! s_a, any of these would make sense for a problem like this. The code for sapply checks at the end to see whether the results can be simplified. With vapply you specify the output template so the function is a little bit faster.
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 9 years
    @josilber, Hmmm. Woke up to this again--rapply performs closer to vapply for me.
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 9 years
    @s_a, you can go ahead and accept your answer. I don't think you're going to get anything better! :-)
  • josliber
    josliber over 9 years
    @AnandaMahto interesting -- I saw a lot more different between the four approaches in my benchmarking (comment added in your link).
  • A5C1D2H2I1M1N2O1R2T1
    A5C1D2H2I1M1N2O1R2T1 over 9 years
    @josilber, other than you having a much faster system than me, I'm not sure what to make of this. I just added a comment with my timings on your sample data.