Find string in data.frame

38,331

Solution 1

What about:

which(df == "horse", arr.ind = TRUE)
#      row col
# [1,]   2   1
# [2,]   3   1
# [3,]   5   2
# [4,]   4   3

Solution 2

Another way around:

l <- sapply(colnames(df), function(x) grep("horse", df[,x]))

$animal
[1] 2 3

$level
[1] 5

$length
[1] 4

If you want the output to be matrix:

sapply(l,'[',1:max(lengths(l)))

     animal level length
[1,]      2     5      4
[2,]      3    NA     NA

Solution 3

Another way to do it is the following:

library(data.table)
library(zoo)
library(dplyr)
library(timeDate)
library(reshape2)
data frame name = tbl_account

first,Transpose it :

temp = t(tbl_Account)

Then, put it in to a list :

temp = list(temp)

This essentially puts every single observation in a data frame in to one massive string, allowing you to search the whole data frame in one go.

then do the searching :

temp[[1]][grep("Horse",temp[[1]])] #brings back the actual value occurrences
grep("Horse", temp[[1]]) # brings back the position of the element in a list it occurs in 

hope this helps :)

Share:
38,331
Jonas Lindeløv
Author by

Jonas Lindeløv

Former researcher. These days I'm mostly using R and SQL but I have an extended history with Python and Matlab. Also did some projects in HTML5 (javascript) and PHP. Linux fan pressured to use Windows.

Updated on October 05, 2020

Comments

  • Jonas Lindeløv
    Jonas Lindeløv over 3 years

    How do I search for a string in a data.frame? As a minimal example, how do I find the locations (columns and rows) of 'horse' in this data.frame?

    > df = data.frame(animal=c('goat','horse','horse','two', 'five'), level=c('five','one','three',30,'horse'), length=c(10, 20, 30, 'horse', 'eight'))
    > df
      animal level length
    1   goat  five     10
    2  horse   one     20
    3  horse three     30
    4    two    30  horse
    5   five horse  eight
    

    ... so row 4 and 5 have the wrong order. Any output that would allow me to identify that 'horse' has shifted to the level column in row 5 and to the length column in row 4 is good. Maybe:

    > magic_function(df, 'horse')
    col       row
    'animal', 2
    'animal', 3
    'length', 4
    'level',  5
    

    Here's what I want to use this for: I have a very large data frame (around 60 columns, 20.000 rows) in which some columns are messed up for some rows. It's too large to eyeball in order to identify the different ways that order can be wrong, so searching would be nice. I will use this info to move data to the correct columns for these rows.