Pandas str.contains for exact matches of partial strings
You can pass regex=False
to avoid confusion in the interpretation of the argument to str.contains
:
>>> df.full_path.str.contains(ex)
0 False
1 False
2 False
3 False
4 False
5 False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0 False
1 False
2 False
3 False
4 False
5 True
Name: full_path, dtype: bool
(Aside: your lambda x: ex in x
should have worked. The NameError is a sign that you hadn't defined ex
for some reason.)
Related videos on Youtube
endangeredoxen
Updated on September 16, 2022Comments
-
endangeredoxen over 1 year
I have a DataFrame (I'll call it
test
) with a column containing file paths and I want to filter the data using a partial path.full_path 0 C:\data\Data Files\BER\figure1.png 1 C:\data\Data Files\BER\figure2.png 2 C:\data\Previous\Error\summary.png 3 C:\data\Data Files\Val\1x2.png 4 C:\data\Data Files\Val\2x2.png 5 C:\data\Microscopy\defect.png
The partial path to find is:
ex = 'C:\\data\\Microscopy'
I've tried
str.contains
but,test.full_path.str.contains(ex) 0 False 1 False 2 False 3 False 4 False 5 False
I would have expected a value of
True
for index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but:ex in test.full_path.iloc[5]
equals
True
. After some digging, I'm thinking the argument tostr.contains
is supposed to be a regex expression so maybe the "\"s in the partial path are messing things up?I also tried:
test.full_path.apply(lambda x: ex in x)
but this gives
NameError: name 'ex' is not defined
. These DataFrames can have a lot of rows in them so I'm also concerned that theapply
function might not be very efficient.Any suggestions on how to search a DataFrame column for exact partial string matches?
Thanks!
-
endangeredoxen over 8 yearsThank you DSM! I should have caught that in the docs. (I also thought the lambda expression should have worked.
ex
is definitely defined in the code...maybe it had something to do with the fact I tried it at aset_trace
using the python debugger).