append rows to dataframe using foreach package

11,846

Solution 1

I think you need to read the docs for foreach. Your code block should compute a single part, then you should use the .combine option to say how to join them all together. Look at the examples in the help(foreach) for more guidance. Its not a straight replacement for a for loop.

For example:

> resultdf = foreach(i=1:10,.combine=rbind)%dopar%{data.frame(x=runif(4),i=i)}
> resultdf
            x  i
1  0.23794248  1
2  0.15536320  1
3  0.58609635  1
4  0.98780497  1
5  0.97806482  2
6  0.92440741  2
7  0.13416121  2
8  0.81598340  2
9  0.13834423  3
[etc]

Solution 2

You need to modify your 'foreach loop' such as:

newdf = foreach(ind=1:1000, .combine=rbind) %dopar%
{
    testdf$X = sample(testdf$X,nrow(testdf), replace=FALSE)
    fit = lm(X ~ Y, testdf)
    data.frame(pc=ind, err=sum(residuals(fit)^2) )
}

Hope it helps!

Share:
11,846
ifreak
Author by

ifreak

Cancer Bioinformatics Scientist

Updated on August 15, 2022

Comments

  • ifreak
    ifreak over 1 year

    I have a problem with appending values to a data frame using parallel processing.

    I have a function that will do some calculation and return a dataframe, including these calculation is a random sampling.

    so what i did is :

    randomizex <- function(testdf)
    {
        foreach(ind=1:1000)%dopar%
        {
          testdf$X = sample(testdf$X,nrow(testdf), replace=FALSE)
          fit = lm(X ~ Y, testdf)
          newdf <- rbind(newdf, data.frame(pc=ind, err=sum(residuals(fit)^2) ))
    
        }
    
    return(newdf)
    }
    resdf = randomizex(mydf)
    

    when i view the result of resdf, it's empty

    if i replace %dopar% with %do% the result is calculated correctly but it's too slow ..

    is there anyway to boost this a bit ??

  • ifreak
    ifreak about 11 years
    ok, thank you for your answer, but how can i return the resulted df and use it somewhere else ?? it's just being printed to the stdou ..