Select subset of rows of dataframe using multiple conditions
Solution 1
This is a Julia thing, not so much a DataFrame thing: you want &
instead of &&
. For example:
julia> [true, true] && [false, true]
ERROR: TypeError: non-boolean (Array{Bool,1}) used in boolean context
julia> [true, true] & [false, true]
2-element Array{Bool,1}:
false
true
julia> df[(df[:A].<5)&(df[:B].=="c"),:]
2x2 DataFrames.DataFrame
| Row | A | B |
|-----|---|-----|
| 1 | 3 | "c" |
| 2 | 4 | "c" |
FWIW, this works the same way in pandas in Python:
>>> df[(df.A < 5) & (df.B == "c")]
A B
1 3 c
2 4 c
Solution 2
I have the same now as https://stackoverflow.com/users/5526072/jwimberley , occurring on my update to julia 0.6 from 0.5, and now using dataframes v 0.10.1.
Update: I made the following change to fix:
r[(r[:l] .== l) & (r[:w] .== w), :] # julia 0.5
r[.&(r[:l] .== l, r[:w] .== w), :] # julia 0.6
but this gets very slow with long chains (time taken \propto 2^chains) so maybe Query is the better way now:
# r is a dataframe
using Query
q1 = @from i in r begin
@where i.l == l && i.w == w && i.nl == nl && i.lt == lt &&
i.vz == vz && i.vw == vw && i.vδ == vδ &&
i.ζx == ζx && i.ζy == ζy && i.ζδx == ζδx
@select {absu=i.absu, i.dBU}
@collect DataFrame
end
for example. This is fast. It's in the DataFrames documentation.
Related videos on Youtube
ARM
Updated on April 28, 2020Comments
-
ARM almost 4 years
I would like to select a subset of a dataframe that satisfies multiple conditions on multiple rows. I know I could this sequentially -- first selecting the subset that matches the first condition, then the portion of those that match the second, etc, but it seems like it should be able to be done in a single step. The following seems like it should work, but doesn't. Apparently it does work like this in other languages' implementations of DataFrame. Any thoughts?
using DataFrames df = DataFrame() df[:A]=[ 1, 3, 4, 7, 9] df[:B]=[ "a", "c", "c", "D", "c"] df[(df[:A].<5)&&(df[:B].=="c"),:] type: non-boolean (DataArray{Bool,1}) used in boolean context while loading In[18], in expression starting on line 5
-
ARM about 9 yearsWhoops, I should have caught that myself -- I even looked at the related pandas questions but was too used to typing "&&"... Thanks!
-
jwimberley over 6 yearsHas this changed? Trying something similar I get the error
WARNING: a::DataArray{$(Expr(:<:, :Integer))} & b::DataArray{$(Expr(:<:, :Integer))} is deprecated, use &.(a, b) instead.
-
Anton Degterev over 3 yearsIn modern Julia (in my case 1.5.3), a small modification of the proposed solution is needed:
df[(df[:A] .< 5) .& (df[:B] .== "c"), :]
-
Smithey over 2 yearsUpdated answer