how to use stemDocument in R?
This problem appears in tm 0.6 and has to do with using functions that are not in the list of getTransformation()
from tm. The problem is that tolower
just returns a character vector, and not a "PlainTextDocument" like tm_map
would like. The tm
packages provides the content_transformer
function to take care of managing the PlainTextDocument
fbCorpus <- tm_map(fbCorpus, content_transformer(tolower))
user3754216
Updated on June 04, 2022Comments
-
user3754216 over 1 year
update:
Thanks for help. Check comments. Because of package version, I delete the tolower and it works. I just need to find another way to make it lower.
============ I am doing basic txt mining in with a list of document, everything goes on fine till I try to use
stemmDocument
.the
tm_map
I already done is as following withlibrary(tm)
fbVec<-VectorSource(data[,1]) fbCorpus<-Corpus(fb.vec) fbCorpus <- tm_map(fbCorpus, tolower) fbCorpus <- tm_map(fbCorpus, removePunctuation) fbCorpus <- tm_map(fbCorpus, removeNumbers) fbCorpus <- tm_map(fbCorpus, removeWords, stopwords("english")) fbCorpus <- tm_map(fbCorpus, removeWords, "pr") fbCorpus <- tm_map(fbCorpus, stripWhitespace)
The results from it is as following
[[1]] [1] easy post position search resumes improvement searching resumes [[2]] [1] easy use good candidiates improvement allow multiple emails sent [[3]] [1] applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school [[4]] [1] abundance resumes
Then I tried to stem
library(SnowballC) fbCorpus <- tm_map(fbCorpus, stemDocument)
But the results is not as I image, it looks like only deal with the last word in a sentence, result as following:
[[1]] [1] easy post position search resumes improvement searching resum [[2]] [1] easy use good candidiates improvement allow multiple emails sent [[3]] [1] applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school [[4]] [1] abundance resum
Is there anyone can help?