How to use str.contains() with multiple expressions, in pandas dataframes?
They should be one regular expression, and should be in one string:
"nt|nv" # rather than "nt" | " nv"
f_recs[f_recs['Behavior'].str.contains("nt|nv", na=False)]
Python doesn't let you use the or (|
) operator on strings:
In [1]: "nt" | "nv"
TypeError: unsupported operand type(s) for |: 'str' and 'str'
M.A.Kline
I'm a behavioral scientist (Anthropologist of the quantitative persuasion). I'm a beginner with programming and my main goals are regarding data manipulation. Hence, I'm focused on becoming competent in PANDAS, which has so far been a great tool for data wrangling.
Updated on June 16, 2021Comments
-
M.A.Kline almost 3 years
I'm wondering if there is a more efficient way to use the str.contains() function in Pandas, to search for two partial strings at once. I want to search a given column in a dataframe for data that contains either "nt" or "nv". Right now, my code looks like this:
df[df['Behavior'].str.contains("nt", na=False)] df[df['Behavior'].str.contains("nv", na=False)]
And then I append one result to another. What I'd like to do is use a single line of code to search for any data that includes "nt" OR "nv" OR "nf." I've played around with some ways that I thought should work, including just sticking a pipe between terms, but all of these result in errors. I've checked the documentation, but I don't see this as an option. I get errors like this:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-113-1d11e906812c> in <module>() 3 4 ----> 5 soctol = f_recs[f_recs['Behavior'].str.contains("nt"|"nv", na=False)] 6 soctol TypeError: unsupported operand type(s) for |: 'str' and 'str'
Is there a fast way to do this? Thanks for any help, I am a beginner but am LOVING pandas for data wrangling.
-
kabrapankaj32 about 7 yearsthanks such a beauty!. caution though, there has to be no space between the pipe and the search terms!
-
Wiktor Stribiżew over 6 years@jaknap32: If you use
(?x)
modifier, you may add spaces wherever you want -"(?x)nt | nv"
- (but if you have meaningful spaces in the pattern, you will need to escape them, as well as#
char). See Pythonre.X
docs. Anyway,n[tv]
is a better regex thannt|nv
. -
Arthur D. Howland over 6 years+1 for the "na=False" expression. My data has gaps in it and my string contains function won't work without it.