Pandas str.contains for exact matches of partial strings

10,339

You can pass regex=False to avoid confusion in the interpretation of the argument to str.contains:

>>> df.full_path.str.contains(ex)
0    False
1    False
2    False
3    False
4    False
5    False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0    False
1    False
2    False
3    False
4    False
5     True
Name: full_path, dtype: bool

(Aside: your lambda x: ex in x should have worked. The NameError is a sign that you hadn't defined ex for some reason.)

Share:
10,339

Related videos on Youtube

endangeredoxen
Author by

endangeredoxen

Updated on September 16, 2022

Comments

  • endangeredoxen
    endangeredoxen over 1 year

    I have a DataFrame (I'll call it test) with a column containing file paths and I want to filter the data using a partial path.

                                  full_path
    0    C:\data\Data Files\BER\figure1.png
    1    C:\data\Data Files\BER\figure2.png
    2    C:\data\Previous\Error\summary.png
    3        C:\data\Data Files\Val\1x2.png
    4        C:\data\Data Files\Val\2x2.png
    5         C:\data\Microscopy\defect.png
    

    The partial path to find is:

    ex = 'C:\\data\\Microscopy'
    

    I've tried str.contains but,

    test.full_path.str.contains(ex)
    
    0    False
    1    False
    2    False
    3    False
    4    False
    5    False
    

    I would have expected a value of True for index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but:

    ex in test.full_path.iloc[5]
    

    equals True. After some digging, I'm thinking the argument to str.contains is supposed to be a regex expression so maybe the "\"s in the partial path are messing things up?

    I also tried:

    test.full_path.apply(lambda x: ex in x)
    

    but this gives NameError: name 'ex' is not defined. These DataFrames can have a lot of rows in them so I'm also concerned that the apply function might not be very efficient.

    Any suggestions on how to search a DataFrame column for exact partial string matches?

    Thanks!

  • endangeredoxen
    endangeredoxen over 8 years
    Thank you DSM! I should have caught that in the docs. (I also thought the lambda expression should have worked. ex is definitely defined in the code...maybe it had something to do with the fact I tried it at a set_trace using the python debugger).