How to use str.contains() with multiple expressions, in pandas dataframes?

102,595

They should be one regular expression, and should be in one string:

"nt|nv"  # rather than "nt" | " nv"
f_recs[f_recs['Behavior'].str.contains("nt|nv", na=False)]

Python doesn't let you use the or (|) operator on strings:

In [1]: "nt" | "nv"
TypeError: unsupported operand type(s) for |: 'str' and 'str'
Share:
102,595
M.A.Kline
Author by

M.A.Kline

I'm a behavioral scientist (Anthropologist of the quantitative persuasion). I'm a beginner with programming and my main goals are regarding data manipulation. Hence, I'm focused on becoming competent in PANDAS, which has so far been a great tool for data wrangling.

Updated on June 16, 2021

Comments

  • M.A.Kline
    M.A.Kline almost 3 years

    I'm wondering if there is a more efficient way to use the str.contains() function in Pandas, to search for two partial strings at once. I want to search a given column in a dataframe for data that contains either "nt" or "nv". Right now, my code looks like this:

        df[df['Behavior'].str.contains("nt", na=False)]
        df[df['Behavior'].str.contains("nv", na=False)]
    

    And then I append one result to another. What I'd like to do is use a single line of code to search for any data that includes "nt" OR "nv" OR "nf." I've played around with some ways that I thought should work, including just sticking a pipe between terms, but all of these result in errors. I've checked the documentation, but I don't see this as an option. I get errors like this:

        ---------------------------------------------------------------------------
        TypeError                                 Traceback (most recent call last)
        <ipython-input-113-1d11e906812c> in <module>()
        3 
        4 
        ----> 5 soctol = f_recs[f_recs['Behavior'].str.contains("nt"|"nv", na=False)]
        6 soctol
    
        TypeError: unsupported operand type(s) for |: 'str' and 'str'
    

    Is there a fast way to do this? Thanks for any help, I am a beginner but am LOVING pandas for data wrangling.

  • kabrapankaj32
    kabrapankaj32 about 7 years
    thanks such a beauty!. caution though, there has to be no space between the pipe and the search terms!
  • Wiktor Stribiżew
    Wiktor Stribiżew over 6 years
    @jaknap32: If you use (?x) modifier, you may add spaces wherever you want - "(?x)nt | nv" - (but if you have meaningful spaces in the pattern, you will need to escape them, as well as # char). See Python re.X docs. Anyway, n[tv] is a better regex than nt|nv.
  • Arthur D. Howland
    Arthur D. Howland over 6 years
    +1 for the "na=False" expression. My data has gaps in it and my string contains function won't work without it.