Pandas str.extract: AttributeError: 'str' object has no attribute 'str'
11,343
You can handle the positive case using the following:
In [150]:
import re
df['fundleverage'] = '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00'
df
Out[150]:
name fundleverage
0 BULL AXP UN X3 VON +300
1 BULL ESTOX X12 S +1200
You can use np.where
to handle both cases in a one liner:
In [151]:
df['fundleverage'] = np.where(df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(), '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df
Out[151]:
name fundleverage
0 BULL AXP UN X3 VON +300
1 BULL ESTOX X12 S +1200
So the above uses the vectorised str
methods strip
, extract
and isdigit
to achieve what you want.
Update
After you changed your requirements (which you should not do for future reference) you can mask the df for the bull and bear cases:
In [189]:
import re
df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])
bull_mask_name = df.loc[df['name'].str.contains('bull', case=False), 'name']
bear_mask_name = df.loc[df['name'].str.contains('bear', case=False), 'name']
df.loc[df['name'].str.contains('bull', case=False), 'fundleverage'] = np.where(bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(), '+' + bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df.loc[df['name'].str.contains('bear', case=False), 'fundleverage'] = np.where(bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x').str.isdigit(), '-' + bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x') + '00', '-100')
df
Out[189]:
name fundleverage
0 BULL AXP UN X3 VON +300
1 BEAR ESTOX 12x S -1200
Author by
P A N
Updated on June 05, 2022Comments
-
P A N almost 2 years
I'm trying to repurpose this function from using
split
to usingstr.extract
(regex) instead.def bull_lev(x): spl = x.rsplit(None, 2)[-2].strip("Xx") if spl.str.isdigit(): return "+" + spl + "00" return "+100" def bear_lev(x): spl = x.rsplit(None, 2)[-2].strip("Xx") if spl.str.isdigit(): return "-" + spl + "00" return "-100" df["leverage"] = df["name"].map(lambda x: bull_lev(x) if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100"
I am using
pandas
forDataFrame
handling:import pandas as pd df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])
Desired output:
name leverage "BULL AXP UN X3 VON" "+300" "BEAR ESTOX 12x S" "-1200"
Faulty regex attempt for
"BULL"
:def bull_lev(x): #spl = x.rsplit(None, 2)[-2].strip("Xx") spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x") if spl.str.isdigit(): return "+" + spl + "00" return "+100" df["leverage"] = df["name"].map(lambda x: bull_lev(x) if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100")
Produces error:
Traceback (most recent call last): File "toolkit.py", line 128, in <module> df["leverage"] = df["name"].map(lambda x: bull_lev(x) File "/Python/Virtual/py2710/lib/python2.7/site-packages/pandas/core/series.py", line 2016, in map mapped = map_f(values, arg) File "pandas/src/inference.pyx", line 1061, in pandas.lib.map_infer (pandas/lib.c:58435) File "toolkit.py", line 129, in <lambda> if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100") File "toolkit.py", line 123, in bear_lev spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x") AttributeError: 'str' object has no attribute 'str'
I am assuming this is due to
str.extract
capturing a list whilesplit
works directly with the string?