Count multiple patterns using the str_count function in R

r string count

11,318

Solution 1

Do you mean the str_count function in the stringr package?

If so, it uses regular expressions and in the pattern for regular expressions the | character means "or", so str_count(mydf$string, 'apple|pear') will count the occurrences of "apple" or "pear" to give a total count. The string with the | characters can be constructed with paste, try:

str_count(mydf$string, paste(Uniques, collapse='|'))

You can see the string that is constructed by paste by just running that part of the code. Note that if you construct a pattern with a lot of options then it may run very slowly. Another option would be to split the 1st string into individual words and compare the vector of words with the vector of options using the %in% operator (then count the TRUE's).

Solution 2

If you are counting occurrences in a character vector created by unique() the count should be 1 for everything ;) Try using the table() function on your vector.

table(c("Apples","Pears","Oranges","Apples","Apples","Pears"))[["Apples"]]

Solution 3

Not sure about the expected result. Perhaps:

 library(stringr)
 sapply(unique(vec1), function(x) str_count(vec1,x))
 #       Apples Pears Oranges
 #[1,]      1     0       0
 #[2,]      0     1       0
 #[3,]      0     0       1
 #[4,]      1     0       0
 #[5,]      1     0       0
 #[6,]      0     1       0

data

  vec1 <- c("Apples","Pears","Oranges","Apples","Apples","Pears")

11,318

Author by

Rhysj

Updated on August 08, 2022

Comments

Rhysj almost 2 years
Fairly new to R and struggling a bit with using the string_count function to detect multiple words that are unknown and are contained within a separate vector.

Now, I know how to detect a single instance of a pattern using the following code:
```
str_count(mydf$string, "Apples")
```
What I want to do is detect multiple words (e.g. "Apples", "Pears", "Oranges" etc) from a vector that is in itself created from another data frame (e.g. by using Uniques<-unique(mydf1$words)).

The key thing here is that the words that appear in mydf1$words are entirely dependent on what data has been uploaded to R in the first place, as this will change from data set to data set.

The answer is probably pretty straight forward but for the life of me I cant seem to work it out!