How to monitor progress of an apply function?

12,140

Solution 1

You can use the library pbapply(git), which shows a time estimate and a progress bar to any function in the '*apply' family.

In the case of your question:

library(pbapply)      
library(data.table)      

result <- data.table::rbindlist( pbapply(COMB, 1, append) )

ps. This answer solves your two initial points. Regarding the third point, I'm not sure if it's possible to pause the function. In any case, your operation is indeed taking too long, so I would recommend you post a separate question asking how to optimize your task.

Solution 2

You can use txtProgressBar from the utils package:

total <- 50
pb <- txtProgressBar(min = 0, max = total, style = 3)

lapply(1:total, function(i){
Sys.sleep(0.1)
setTxtProgressBar(pb, i)
})

OR use *ply family from plyr package

library(plyr)
laply(1:100, function(i) {Sys.sleep(0.05); i}, .progress = "text")

Check ?create_progress_bar() for more details

Solution 3

Try this instead:

setkey(SOURCE, NAME)

SOURCE[, CJ(NAME, NAME, unique = T)][
       , mean(SOURCE[V1, VALUE] == SOURCE[V2, VALUE]), by = .(V1, V2)]

Fwiw, the all-caps names are an awful choice imo - makes writing and reading code significantly harder.

Share:
12,140
ChiseledAbs
Author by

ChiseledAbs

Updated on June 06, 2022

Comments

  • ChiseledAbs
    ChiseledAbs almost 2 years

    I need to work out a 2886*2886 correlation matrix, problem is that building an intermediary datatable (RESULT) takes a long time for it to be binded together so I would like to be able to do the following things while calling the last line RESULT=rbindlist(apply(COMB, 1, append)) in the code below :

    1. Estimate the time it will take for the apply function to complete
    2. Monitor its progress
    3. Be able to pause and continue at later time

    Here is the code :

    SOURCE=data.table(NAME=rep(paste0("NAME", as.character(1:2889)), each=600), VALUE=sample(c(TRUE,FALSE), 600, TRUE) )
    > SOURCE
                NAME VALUE
          1:   NAME1  TRUE
          2:   NAME1  TRUE
          3:   NAME1  TRUE
          4:   NAME1  TRUE
          5:   NAME1  TRUE
         ---              
    1733396: NAME999  TRUE
    1733397: NAME999  TRUE
    1733398: NAME999  TRUE
    1733399: NAME999  TRUE
    1733400: NAME999 FALSE
    
    setkey(SOURCE,NAME)
    a=SOURCE[,unique(NAME)]
    COMB=data.table(expand.grid(a,a, stringsAsFactors=FALSE))
    > COMB
                 Var1    Var2
          1:    NAME1   NAME1
          2:   NAME10   NAME1
          3:  NAME100   NAME1
          4: NAME1000   NAME1
          5: NAME1001   NAME1
         ---                 
    8346317:  NAME995 NAME999
    8346318:  NAME996 NAME999
    8346319:  NAME997 NAME999
    8346320:  NAME998 NAME999
    8346321:  NAME999 NAME999
    
    append <- function(X) {
    data.table(NAME1=X[1], VALUE1=SOURCE[X[1], VALUE], 
        NAME2=X[2], VALUE2=SOURCE[X[2], VALUE] )
    }
    
    RESULT=rbindlist(apply(COMB, 1, append))
    

    Any idea ?

    Also do you know if there is a faster way to generate the datatable RESULT from SOURCE ? RESULTis an intermediary datatable to work out the correlation values between VALUE1 and VALUE2 for each couple of NAME.

    With a subset of SOURCE RESULTlooks like that :

    SOURCE=SOURCE[sample(1:nrow(SOURCE), 3)]
    setkey(SOURCE,NAME)
    a=SOURCE[,unique(NAME)]
    COMB=data.table(expand.grid(a,a, stringsAsFactors=FALSE))
    RESULT=rbindlist(apply(COMB, 1, append))
    > RESULT
          NAME1 VALUE1    NAME2 VALUE2
    1: NAME1859   TRUE NAME1859   TRUE
    2:  NAME768  FALSE NAME1859   TRUE
    3:  NAME795   TRUE NAME1859   TRUE
    4: NAME1859   TRUE  NAME768  FALSE
    5:  NAME768  FALSE  NAME768  FALSE
    6:  NAME795   TRUE  NAME768  FALSE
    7: NAME1859   TRUE  NAME795   TRUE
    8:  NAME768  FALSE  NAME795   TRUE
    9:  NAME795   TRUE  NAME795   TRUE
    

    Later on I'm going to do RESULT[,VALUE3:=(VALUE1==VALUE2)] to finally get the correlation values : RESULT[, mean(VALUE3), by=c("NAME1", "NAME2")] So maybe the whole process can be done more efficiently, who knows.