how to use stemDocument in R?

r nlp
12,220

This problem appears in tm 0.6 and has to do with using functions that are not in the list of getTransformation() from tm. The problem is that tolower just returns a character vector, and not a "PlainTextDocument" like tm_map would like. The tm packages provides the content_transformer function to take care of managing the PlainTextDocument

fbCorpus  <- tm_map(fbCorpus, content_transformer(tolower))
Share:
12,220
user3754216
Author by

user3754216

Updated on June 04, 2022

Comments

  • user3754216
    user3754216 over 1 year

    update:

    Thanks for help. Check comments. Because of package version, I delete the tolower and it works. I just need to find another way to make it lower.

    ============ I am doing basic txt mining in with a list of document, everything goes on fine till I try to use stemmDocument.

    the tm_map I already done is as following with library(tm)

    fbVec<-VectorSource(data[,1])
    fbCorpus<-Corpus(fb.vec)
    fbCorpus <- tm_map(fbCorpus, tolower)
    fbCorpus <- tm_map(fbCorpus, removePunctuation)
    fbCorpus <- tm_map(fbCorpus, removeNumbers)
    fbCorpus <- tm_map(fbCorpus, removeWords, stopwords("english"))
    fbCorpus <- tm_map(fbCorpus, removeWords, "pr")
    fbCorpus <- tm_map(fbCorpus, stripWhitespace)
    

    The results from it is as following

    [[1]]
    [1]  easy post position search resumes improvement searching resumes
    
    [[2]]
    [1]  easy use good candidiates improvement allow multiple emails sent 
    
    [[3]]
    [1]  applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school
    
    [[4]]
    [1]  abundance resumes
    

    Then I tried to stem

    library(SnowballC)    
    fbCorpus <- tm_map(fbCorpus, stemDocument)
    

    But the results is not as I image, it looks like only deal with the last word in a sentence, result as following:

    [[1]]
    [1]  easy post position search resumes improvement searching resum
    
    [[2]]
    [1]  easy use good candidiates improvement allow multiple emails sent 
    
    [[3]]
    [1]  applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school
    
    [[4]]
    [1]  abundance resum
    

    Is there anyone can help?