wordcloud package: get “Error in strwidth(…) : invalid 'cex' value”

r tm
17,861

Solution 1

You have a typo in TataMotors twitter account. It should be spelled 'TataMotors', not 'TataMotor'. As a result, one column in your term matrix is empty and when cex is calculated it get assigned NAN.

Fix the typo and the rest of the code works fine. Good luck!

enter image description here

Solution 2

I spotted the empty-column issue in a different application throwing the same error. In my case it was because of the removeSparseTerms command applied to a document term matrix. Using str() helped me identify the bug.

The input variable (slightly edited) had 289 columns:

> str(corpus.dtm)
List of 6
$ i       : int [1:443] 3 4 6 8 10 12 15 18 19 21 ...
$ j       : int [1:443] 105 98 210 93 287 249 126 223 129 146 ...
$ v       : num [1:443] 1 1 1 1 1 1 1 1 1 1 ...
$ nrow    : int 1408
$ ncol    : int 289
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: chr [1:289] "word1" "word2" "word3" "word4" ...
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"

The command was:

removeSparseTerms(corpus.dtm,0.90)->corpus.dtm.frequent

And the result had 0 columns:

> str(corpus.dtm.frequent)
List of 6
$ i       : int(0) 
$ j       : int(0) 
$ v       : num(0) 
$ nrow    : int 1408
$ ncol    : int 0
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: NULL
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"

Raising the sparsity coefficient from 0.90 to 0.95 solved the issue. For a wordier document I went up to 0.999 in order to have a non-empty result after removing the sparse terms.

Empty columns are a good thing to check out when this error occurs.

Share:
17,861

Related videos on Youtube

Abhishek Kapoor
Author by

Abhishek Kapoor

Updated on June 04, 2022

Comments

  • Abhishek Kapoor
    Abhishek Kapoor almost 2 years

    I am using the tm and wordcloud packages in R 2.15.1. I am trying to make a word cloud Here is the code:

    maruti_tweets = userTimeline("Maruti_suzuki", n=1000,cainfo="cacert.pem")
    hyundai_tweets = userTimeline("HyundaiIndia", n=1000,cainfo="cacert.pem")
    tata_tweets = userTimeline("TataMotor", n=1000,cainfo="cacert.pem")
    toyota_tweets = userTimeline("Toyota_India", n=1000,cainfo="cacert.pem")
    # get text
    maruti_txt = sapply(maruti_tweets, function(x) x$getText())
    hyundai_txt = sapply(hyundai_tweets, function(x) x$getText())
    tata_txt = sapply(tata_tweets, function(x) x$getText())
    toyota_txt = sapply(toyota_tweets, function(x) x$getText())
    clean.text = function(x)
    
    {
       # tolower
       x = tolower(x)
       # remove rt
       x = gsub("rt", "", x)
       # remove at
       x = gsub("@\\w+", "", x)
       # remove punctuation
       x = gsub("[[:punct:]]", "", x)
       # remove numbers
       x = gsub("[[:digit:]]", "", x)
       # remove links http
       x = gsub("http\\w+", "", x)
       # remove tabs
       x = gsub("[ |\t]{2,}", "", x)
       # remove blank spaces at the beginning
       x = gsub("^ ", "", x)
       # remove blank spaces at the end
       x = gsub(" $", "", x)
       return(x)
    }
    # clean texts
    maruti_clean = clean.text(maruti_txt)
    hyundai_clean = clean.text(hyundai_txt)
    tata_clean = clean.text(tata_txt)
    toyota_clean = clean.text(toyota_txt)
    maruti = paste(maruti_clean, collapse=" ")
    hyundai= paste(hyundai_clean, collapse=" ")
    tata= paste(tata_clean, collapse=" ")
    toyota= paste(toyota_clean, collapse=" ")
    # put ehyundaiything in a single vector
    all = c(maruti, hyundai, tata, toyota)
    # remove stop-words
    all = removeWords(all,
    c(stopwords("english"), "maruti", "tata", "hyundai", "toyota"))
    # create corpus
    corpus = Corpus(VectorSource(all))
    # create term-document matrix
    tdm = TermDocumentMatrix(corpus)
    # convert as matrix
    tdm = as.matrix(tdm)
    # add column names
    colnames(tdm) = c("MARUTI", "HYUNDAI", "TATA", "TOYOTA")
    # comparison cloud
    comparison.cloud(tdm, random.order=FALSE,colors = c("#00B2FF", "red",     #FF0099","#6600CC"),max.words=500)
    

    but getting following error

    Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
    please help