Selecting data frame rows based on partial string match in a column
Solution 1
I notice that you mention a function %like%
in your current approach. I don't know if that's a reference to the %like%
from "data.table", but if it is, you can definitely use it as follows.
Note that the object does not have to be a data.table
(but also remember that subsetting approaches for data.frame
s and data.table
s are not identical):
library(data.table)
mtcars[rownames(mtcars) %like% "Merc", ]
iris[iris$Species %like% "osa", ]
If that is what you had, then perhaps you had just mixed up row and column positions for subsetting data.
If you don't want to load a package, you can try using grep()
to search for the string you're matching. Here's an example with the mtcars
dataset, where we are matching all rows where the row names includes "Merc":
mtcars[grep("Merc", rownames(mtcars)), ]
mpg cyl disp hp drat wt qsec vs am gear carb
# Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
# Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
# Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
# Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3
# Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3
And, another example, using the iris
dataset searching for the string osa
:
irisSubset <- iris[grep("osa", iris$Species), ]
head(irisSubset)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
For your problem try:
selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]
Solution 2
Try str_detect()
from the stringr package, which detects the presence or absence of a pattern in a string.
Here is an approach that also incorporates the %>%
pipe and filter()
from the dplyr package:
library(stringr)
library(dplyr)
CO2 %>%
filter(str_detect(Treatment, "non"))
Plant Type Treatment conc uptake
1 Qn1 Quebec nonchilled 95 16.0
2 Qn1 Quebec nonchilled 175 30.4
3 Qn1 Quebec nonchilled 250 34.8
4 Qn1 Quebec nonchilled 350 37.2
5 Qn1 Quebec nonchilled 500 35.3
...
This filters the sample CO2 data set (that comes with R) for rows where the Treatment variable contains the substring "non". You can adjust whether str_detect
finds fixed matches or uses a regex - see the documentation for the stringr package.
Solution 3
LIKE
should work in sqlite:
require(sqldf)
df <- data.frame(name = c('bob','robert','peter'),id=c(1,2,3))
sqldf("select * from df where name LIKE '%er%'")
name id
1 robert 2
2 peter 3
Solution 4
Another option would be to simply use grepl
function:
df[grepl('er', df$name), ]
CO2[grepl('non', CO2$Treatment), ]
df <- data.frame(name = c('bob','robert','peter'),
id = c(1,2,3)
)
# name id
# 2 robert 2
# 3 peter 3
Asda
Updated on October 04, 2020Comments
-
Asda over 3 years
I want to select rows from a data frame based on partial match of a string in a column, e.g. column 'x' contains the string "hsa". Using
sqldf
- if it had alike
syntax - I would do something like:select * from <> where x like 'hsa'
.Unfortunately,
sqldf
does not support that syntax.Or similarly:
selectedRows <- df[ , df$x %like% "hsa-"]
Which of course doesn't work.
Can somebody please help me with this?
-
A5C1D2H2I1M1N2O1R2T1 over 11 yearsCan you post a few lines of your data, preferably using something like
dput(head(conservedData))
.
-
-
nico over 11 years+1: also note that
grep
supports regular expressions so you may want to grep for^hsa-
instead. -
Stephan Kolassa over 11 years@nico: in fact,
grep
comes from the ed command g/re/p (global / regular expression / print), and it reveals its real power only to the master of regular expression-fu ;-): en.wikipedia.org/wiki/Grep -
Suat Atan PhD over 7 yearsSQLDF is best for listing. However, it cannot delete rows.
-
rgalbo about 7 yearsWhy is an R package being loaded with
require()
here -
bartektartanus almost 7 yearsBecause it is not standard R library and you have to manually install it and then load using
require
function. -
Aren Cambre over 6 yearsThe %like% suggestion is great! I recommend putting it at top of your answer.
-
A5C1D2H2I1M1N2O1R2T1 over 6 years@ArenCambre, done. Maybe it will help me get another 11 votes so that I might get a new hat before the end of the year :-)
-
Bemipefe about 5 yearsYou can also use the trc_detect function like this
myDataFrame[str_detect(myDataFrame$key, myKeyPattern),]
-
nigus21 over 4 years@A5C1D2H2I1M1N2O1R2T1 Great answer! Is there a way to use %like% to search for two strings that occur together (as in "pet" and "pip" occuring in a row of a dataframe as "peter piper")?
-
Martin over 2 years@Bemipefe don't you mean the str_detect function instead of trc_detect?
-
Bemipefe over 2 years@Martin Yes you are right. That was a typo.