Select subset of rows of dataframe using multiple conditions

14,684

Solution 1

This is a Julia thing, not so much a DataFrame thing: you want & instead of &&. For example:

julia> [true, true] && [false, true]
ERROR: TypeError: non-boolean (Array{Bool,1}) used in boolean context

julia> [true, true] & [false, true]
2-element Array{Bool,1}:
 false
  true

julia> df[(df[:A].<5)&(df[:B].=="c"),:]
2x2 DataFrames.DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 3 | "c" |
| 2   | 4 | "c" |

FWIW, this works the same way in pandas in Python:

>>> df[(df.A < 5) & (df.B == "c")]
   A  B
1  3  c
2  4  c

Solution 2

I have the same now as https://stackoverflow.com/users/5526072/jwimberley , occurring on my update to julia 0.6 from 0.5, and now using dataframes v 0.10.1.

Update: I made the following change to fix:

r[(r[:l] .== l) & (r[:w] .== w), :] # julia 0.5

r[.&(r[:l] .== l, r[:w] .== w), :] # julia 0.6

but this gets very slow with long chains (time taken \propto 2^chains) so maybe Query is the better way now:

# r is a dataframe
using Query
q1 = @from i in r begin
    @where i.l == l && i.w == w && i.nl == nl && i.lt == lt && 
    i.vz == vz && i.vw == vw && i.vδ == vδ && 
    i.ζx == ζx && i.ζy == ζy && i.ζδx == ζδx
    @select {absu=i.absu, i.dBU}
    @collect DataFrame
end

for example. This is fast. It's in the DataFrames documentation.

Share:
14,684

Related videos on Youtube

ARM
Author by

ARM

Updated on April 28, 2020

Comments

  • ARM
    ARM almost 4 years

    I would like to select a subset of a dataframe that satisfies multiple conditions on multiple rows. I know I could this sequentially -- first selecting the subset that matches the first condition, then the portion of those that match the second, etc, but it seems like it should be able to be done in a single step. The following seems like it should work, but doesn't. Apparently it does work like this in other languages' implementations of DataFrame. Any thoughts?

    using DataFrames
    df = DataFrame()
    df[:A]=[ 1, 3, 4, 7, 9]
    df[:B]=[ "a", "c", "c", "D", "c"]
    df[(df[:A].<5)&&(df[:B].=="c"),:] 
    
    type: non-boolean (DataArray{Bool,1}) used in boolean context
    while loading In[18], in expression starting on line 5
    
  • ARM
    ARM about 9 years
    Whoops, I should have caught that myself -- I even looked at the related pandas questions but was too used to typing "&&"... Thanks!
  • jwimberley
    jwimberley over 6 years
    Has this changed? Trying something similar I get the error WARNING: a::DataArray{$(Expr(:<:, :Integer))} & b::DataArray{$(Expr(:<:, :Integer))} is deprecated, use &.(a, b) instead.
  • Anton Degterev
    Anton Degterev over 3 years
    In modern Julia (in my case 1.5.3), a small modification of the proposed solution is needed: df[(df[:A] .< 5) .& (df[:B] .== "c"), :]
  • Smithey
    Smithey over 2 years
    Updated answer

Related