wordcloud package: get “Error in strwidth(…) : invalid 'cex' value”
Solution 1
You have a typo in TataMotors twitter account. It should be spelled 'TataMotors', not 'TataMotor'. As a result, one column in your term matrix is empty and when cex is calculated it get assigned NAN.
Fix the typo and the rest of the code works fine. Good luck!
Solution 2
I spotted the empty-column issue in a different application throwing the same error. In my case it was because of the removeSparseTerms
command applied to a document term matrix. Using str()
helped me identify the bug.
The input variable (slightly edited) had 289 columns:
> str(corpus.dtm)
List of 6
$ i : int [1:443] 3 4 6 8 10 12 15 18 19 21 ...
$ j : int [1:443] 105 98 210 93 287 249 126 223 129 146 ...
$ v : num [1:443] 1 1 1 1 1 1 1 1 1 1 ...
$ nrow : int 1408
$ ncol : int 289
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: chr [1:289] "word1" "word2" "word3" "word4" ...
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"
The command was:
removeSparseTerms(corpus.dtm,0.90)->corpus.dtm.frequent
And the result had 0 columns:
> str(corpus.dtm.frequent)
List of 6
$ i : int(0)
$ j : int(0)
$ v : num(0)
$ nrow : int 1408
$ ncol : int 0
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: NULL
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"
Raising the sparsity coefficient from 0.90 to 0.95 solved the issue. For a wordier document I went up to 0.999 in order to have a non-empty result after removing the sparse terms.
Empty columns are a good thing to check out when this error occurs.
Related videos on Youtube
Abhishek Kapoor
Updated on June 04, 2022Comments
-
Abhishek Kapoor almost 2 years
I am using the tm and wordcloud packages in R 2.15.1. I am trying to make a word cloud Here is the code:
maruti_tweets = userTimeline("Maruti_suzuki", n=1000,cainfo="cacert.pem") hyundai_tweets = userTimeline("HyundaiIndia", n=1000,cainfo="cacert.pem") tata_tweets = userTimeline("TataMotor", n=1000,cainfo="cacert.pem") toyota_tweets = userTimeline("Toyota_India", n=1000,cainfo="cacert.pem") # get text maruti_txt = sapply(maruti_tweets, function(x) x$getText()) hyundai_txt = sapply(hyundai_tweets, function(x) x$getText()) tata_txt = sapply(tata_tweets, function(x) x$getText()) toyota_txt = sapply(toyota_tweets, function(x) x$getText()) clean.text = function(x) { # tolower x = tolower(x) # remove rt x = gsub("rt", "", x) # remove at x = gsub("@\\w+", "", x) # remove punctuation x = gsub("[[:punct:]]", "", x) # remove numbers x = gsub("[[:digit:]]", "", x) # remove links http x = gsub("http\\w+", "", x) # remove tabs x = gsub("[ |\t]{2,}", "", x) # remove blank spaces at the beginning x = gsub("^ ", "", x) # remove blank spaces at the end x = gsub(" $", "", x) return(x) } # clean texts maruti_clean = clean.text(maruti_txt) hyundai_clean = clean.text(hyundai_txt) tata_clean = clean.text(tata_txt) toyota_clean = clean.text(toyota_txt) maruti = paste(maruti_clean, collapse=" ") hyundai= paste(hyundai_clean, collapse=" ") tata= paste(tata_clean, collapse=" ") toyota= paste(toyota_clean, collapse=" ") # put ehyundaiything in a single vector all = c(maruti, hyundai, tata, toyota) # remove stop-words all = removeWords(all, c(stopwords("english"), "maruti", "tata", "hyundai", "toyota")) # create corpus corpus = Corpus(VectorSource(all)) # create term-document matrix tdm = TermDocumentMatrix(corpus) # convert as matrix tdm = as.matrix(tdm) # add column names colnames(tdm) = c("MARUTI", "HYUNDAI", "TATA", "TOYOTA") # comparison cloud comparison.cloud(tdm, random.order=FALSE,colors = c("#00B2FF", "red", #FF0099","#6600CC"),max.words=500)
but getting following error
Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value please help