tm_map has parallel::mclapply error in R 3.0.1 on Mac

15,405

Solution 1

I suspect you don't have the SnowballC package installed, which seems to be required. tm_map is supposed to run stemDocument on all the documents using mclapply. Try just running the stemDocument function on one document, so you can extract the error:

stemDocument(crude[[1]])

For me, I got an error:

Error in loadNamespace(name) : there is no package called ‘SnowballC’

So I just went ahead and installed SnowballC and it worked. Clearly, SnowballC should be a dependency.

Solution 2

I just ran into this. It took me a bit of digging but I found out what was happening.

  1. I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'

  2. Running this produced the error


    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

  1. It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type

    > getOption("mc.cores", 2L)
    [1] 2
    >

  1. Aha moment! Tell the 'tm_map' call to only use one core!

    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1)
    Error in match.fun(FUN) : object 'asPlainTextDocument' not found
    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4)
    Warning message:
    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code
    > 

So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!

So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.

Solution 3

I found an answer to this that was successful for me in this question: Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE to be explicitly defined.

So, your code would look like this

library(tm)
data('crude')
tm_map(crude, stemDocument, lazy = TRUE)

I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.

Solution 4

I have been facing same issue but finally got it fixed. My guess is that if I name the corpus as "longName" or "companyNewsCorpus", I get the issue but if I use corpus value as "a", it works well. Really weird.

Below code gives same error message mentioned in this thread

companyNewsCorpus  <-Corpus(DirSource("SourceDirectory"),
                            readerControl = list(language="english"))
companyNewsCorpus <- tm_map(companyNewsCorpus, 
                            removeWords, stopwords("english")) 

But if I convert this in below, it works without issues.

a  <-Corpus(DirSource("SourceDirectory"), 
            readerControl = list(language="english"))
a <- tm_map(a, removeWords, stopwords("english")) 

Solution 5

I ran into the same problem in tm using an Intel quad core I7 running on Mac OS X 10.10.5, and got the following warning:

In mclapply(content(x), FUN, ...) scheduled core 1 encountered error in user code, all values of the job will be affected

I was creating a corpus after downloading Twitter data.

Charles Copley's solution worked for me as well. I used: tm_map(*filename*, stemDocument, lazy = TRUE) after creating my corpus and then tm worked correctly.

Share:
15,405

Related videos on Youtube

Dominik
Author by

Dominik

I currently work as a research scientist at the Fraunhofer Institute for Industrial Engineering IAO in Stuttgart, Germany, where I work in the field of urban development, complex data analysis, and civil security. One of my current projects is the Morgenstadt (Future City) project, where we investigate six international cities to gain a clearer picture of how they work. We discover which key factors lead to program success and how they work together to create a sustainable city. For more information about me or my research projects please visit my website http://www.kalisch.biz

Updated on June 04, 2022

Comments

  • Dominik
    Dominik almost 2 years

    I am using R 3.0.1 on Platform: x86_64-apple-darwin10.8.0 (64-bit)

    I am trying to use tm_map from the tm library. But when I execute the this code

    library(tm)
    data('crude')
    tm_map(crude, stemDocument)
    

    I get this error:

    Warning message:
    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code
    

    Does anyone know a solution for this?