Pandas str.extract: AttributeError: 'str' object has no attribute 'str'

python regex python-2.7 pandas

11,343

You can handle the positive case using the following:

In [150]:
import re
df['fundleverage'] = '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00'
df

Out[150]:
                 name fundleverage
0  BULL AXP UN X3 VON         +300
1    BULL ESTOX X12 S        +1200

You can use np.where to handle both cases in a one liner:

In [151]:
df['fundleverage'] = np.where(df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(),  '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df

Out[151]:
                 name fundleverage
0  BULL AXP UN X3 VON         +300
1    BULL ESTOX X12 S        +1200

So the above uses the vectorised str methods strip, extract and isdigit to achieve what you want.

Update

After you changed your requirements (which you should not do for future reference) you can mask the df for the bull and bear cases:

In [189]:
import re
df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])
bull_mask_name = df.loc[df['name'].str.contains('bull', case=False), 'name']
bear_mask_name = df.loc[df['name'].str.contains('bear', case=False), 'name']
df.loc[df['name'].str.contains('bull', case=False), 'fundleverage'] = np.where(bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(),  '+' + bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df.loc[df['name'].str.contains('bear', case=False), 'fundleverage'] = np.where(bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x').str.isdigit(),  '-' + bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x') + '00', '-100')
df

Out[189]:
                 name fundleverage
0  BULL AXP UN X3 VON         +300
1    BEAR ESTOX 12x S        -1200

11,343

Author by

P A N

Updated on June 05, 2022

Comments

P A N almost 2 years

I'm trying to repurpose this function from using split to using str.extract (regex) instead.

def bull_lev(x):
    spl = x.rsplit(None, 2)[-2].strip("Xx")
    if spl.str.isdigit():
        return "+" + spl + "00"
    return "+100"

def bear_lev(x):
    spl = x.rsplit(None, 2)[-2].strip("Xx")
    if spl.str.isdigit(): 
        return "-" + spl + "00"
    return "-100"

df["leverage"] = df["name"].map(lambda x: bull_lev(x)
    if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100"

I am using pandas for DataFrame handling:

import pandas as pd
df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])

Desired output:

name                    leverage
"BULL AXP UN X3 VON"    "+300"
"BEAR ESTOX 12x S"      "-1200"

Faulty regex attempt for "BULL":

def bull_lev(x):
    #spl = x.rsplit(None, 2)[-2].strip("Xx")
    spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x")
    if spl.str.isdigit():
        return "+" + spl + "00"
    return "+100"

df["leverage"] = df["name"].map(lambda x: bull_lev(x)
    if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100")

Produces error:

Traceback (most recent call last):
  File "toolkit.py", line 128, in <module>
    df["leverage"] = df["name"].map(lambda x: bull_lev(x)
  File "/Python/Virtual/py2710/lib/python2.7/site-packages/pandas/core/series.py", line 2016, in map
    mapped = map_f(values, arg)
  File "pandas/src/inference.pyx", line 1061, in pandas.lib.map_infer (pandas/lib.c:58435)
  File "toolkit.py", line 129, in <lambda>
    if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100")
  File "toolkit.py", line 123, in bear_lev
    spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x")

AttributeError: 'str' object has no attribute 'str'

I am assuming this is due to str.extract capturing a list while split works directly with the string?