R tm package vcorpus: Error in converting corpus to data frame
Solution 1
Your corpus is really just a character vector with some extra attributes. So it's best to convert it to character, then you can save that to a data.frame like so:
library(tm)
x <- c("Hello. Sir!","Tacos? On Tuesday?!?")
mycorpus <- Corpus(VectorSource(x))
mycorpus <- tm_map(mycorpus, removePunctuation)
dataframe <- data.frame(text=unlist(sapply(mycorpus, `[`, "content")),
stringsAsFactors=F)
which returns
text
1 Hello Sir
2 Tacos On Tuesday
UPDATE: With newer version of tm
, they seem to have updated the as.list.SimpleCorpus
method which really messes with using sapply
and lapply
. Now I guess you'd have to use
dataframe <- data.frame(text=sapply(mycorpus, identity),
stringsAsFactors=F)
Solution 2
The Corpus classed objected has a content
attribute accessible through get
:
library("tm")
x <- c("Hello. Sir!","Tacos? On Tuesday?!?")
mycorpus <- Corpus(VectorSource(x))
mycorpus <- tm_map(mycorpus, removePunctuation)
attributes(mycorpus)
# $names
# [1] "content" "meta" "dmeta"
#
# $class
# [1] "SimpleCorpus" "Corpus"
#
df <- data.frame(text = get("content", mycorpus))
head(df)
# text
# 1 Hello Sir
# 2 Tacos On Tuesday
Solution 3
You can convert to data.frame, sort the most frequent words and plot in a wordcloud!
library(tm)
library("wordcloud")
library("RColorBrewer")
x <- c("Hello. Sir!","Tacos? On Tuesday?!?", "Hello")
mycorpus <- Corpus(VectorSource(x))
mycorpus <- tm_map(mycorpus, removePunctuation)
dtm <- TermDocumentMatrix(mycorpus)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
# word freq
#hello hello 2
#sir sir 1
#tacos tacos 1
#tuesday tuesday 1
#plot in a wordcloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
Solution 4
The older answer posted by MrFlick works only in previous version on tm, I was able to fix it by removing content from the formula.
dataframe<-data.frame(text=unlist(sapply(mycorpus, `[`)), stringsAsFactors=F)
lmcshane
Updated on August 06, 2022Comments
-
lmcshane almost 2 years
I am using the
tm
package to clean up some data using the following code:mycorpus <- Corpus(VectorSource(x)) mycorpus <- tm_map(mycorpus, removePunctuation)
I then want to convert the corpus back into a data frame in order to export a text file that contains the data in the original format of a data frame. I have tried the following:
dataframe <- as.data.frame(mycorpus)
But this returns an error:
"Error in as.data.frame.default.(mycorpus) : cannot coerce class "c(vcorpus, > corpus")" to a data.frame
How can I convert a corpus into a data frame?
-
lmcshane almost 10 yearsthanks! I see the return as a data.frame that has a list with summary data included in the first instance? (1 list(list(content = "Hello Sir", meta = list(author...)
-
Tyler Rinker over 9 yearsOdd, I am using tm version 0.6 (on CRAN currently) and Flick's answer works for me.
-
myloginid almost 9 yearsHello - I get this error in the conversion. Any Idea why? > data.frame(text=unlist(sapply(ccorpus_clean,
[
, "content")), stringsAsFactors=F) Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "try-error"