Select ALL Columns | Rows of a data.table based on condition

14,952

Solution 1

For getting all columns but only rows where all values are superior to a treshold, the best way is to use classic filtering as:

dt[rowMeans(dt>threshold)==1,]

For getting all rows but only columns where all values are superior to a treshold, you can do:

dt[,colMeans(dt>threshold)==1, with=F]

Solution 2

For subsetting rows, the following code uses base R. Because you are looking across rows, you are treating the data table more like a matrix.

rws <- apply(dt, 1L, function(r) any(r > 4))
dt[rws]

For columns, you can use the list-like properties of the data table again:

cls <- sapply(dt, function(c) any(c > 4))
dt[, cls, with = FALSE]
Share:
14,952
rafa.pereira
Author by

rafa.pereira

See my blog Urban Demographics or Twitter for some updates on how I've been procrastinating lately My Google Scholar profile Key research interests include: Commuting patterns and Urban accessibility Spatial organization of cities Accessibility, Spatial Analysis and GTFS data Population aging and demographic projections Justice and Equity issues in transport policies

Updated on June 22, 2022

Comments

  • rafa.pereira
    rafa.pereira almost 2 years

    Two related questions:

    EDIT: ps. I'm looking for solutions based on data.table.

    1. How to select the rows of data.table above a certain threshold for all columns?

    2. How to select the columns of data.table which contain values above a certain threshold ?

    Reproducible example:

    library(data.table)
    dt <- data.table(V1=1:5, V2=3:7, V3=7:3)
    

    conditionally selecting all rows

    # this line selects rows based on column `V1`. 
      dt[ V1 > 2, ] 
    
    # I'm looking for a way to select rows based on values of all columns. My failed attempt
      dt[ names(dt) > 2, ] 
    
    # *expected output*: a data.table with all columns but only with those rows where all values are `> 2` 
    
    #> V1 V2 V3
    #> 3  5  5
    #> 4  6  4
    #> 5  7  3
    

    conditionally selecting all columns

    # My failed attempt
      dt[, .SD, .SDcols > 2 ]
    
    # *expected output*: a data.table with all rows but only with those columns where all values are `> 2`
    
    #>   V2 V3
    #>   3  7
    #>   4  6
    #>   5  5
    #>   6  4
    #>   7  3