Why do I get an AttributeError when using pandas apply?
Solution 1
Some things to note here -
- If you're using only two columns, calling
apply
over 4 columns is wasteful - Calling
apply
is wasteful and inefficient, because it is slow, uses a lot of memory, and offers no vectorisation benefits to you - In apply, you're dealing with scalars, so you do not use the
.str
accessor as you would apd.Series
object.title.contains
would be enough. Or more pythonically,"lip" in title
. gender.isnull
sounds completely wrong to the interpreter becausegender
is a scalar, it has noisnull
attribute
Option 1
np.where
m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your contains
check case insensitive -
m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)
Option 2
Another alternative is using pd.Series.mask
/pd.Series.where
-
df['gender'] = df.gender.mask(m, 'women')
Or,
df['gender'] = df.gender.where(~m, 'women')
<!- ->
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
The mask
implicitly applies the new value to the column based on the mask provided.
Solution 2
Or simply use loc as an option 3 to @COLDSPEED's answer
cond = (df['gender'].isnull()) & (df['title'].str.contains('lip'))
df.loc[cond, 'gender'] = 'women'
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
Solution 3
If we are due with NaN values , fillna
can be one of the method:-)
df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women'))
df
Out[63]:
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
Admin
Updated on July 23, 2022Comments
-
Admin almost 2 years
How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.
category gender sub-category title health&beauty NaN makeup lipbalm health&beauty women makeup lipstick NaN NaN NaN lipgloss
My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like
def impute_gender(cols): category=cols[0] sub_category=cols[2] gender=cols[1] title=cols[3] if title.str.contains('Lip') and gender.isnull==True: return 'women' df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)
If I run the code I am getting error
----> 7 if title.str.contains('Lip') and gender.isnull()==True: 8 print(gender) 9 AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')
Complete Dataset -https://github.com/lakshmipriya04/py-sample
-
Admin over 6 yearsThank you for the answer.When should I use apply function? And why do I get attribute error
-
cs95 over 6 years@LPR who are you speaking to? I've addressed your problems in my answer. Also, as for when to use apply, the answer would be, when you can't use anything else.
-
cs95 over 6 yearsI see you went out of the box to get this one. Nice.
-
BENY over 6 years@cᴏʟᴅsᴘᴇᴇᴅ aha, It is hard to think outside the box : -)
-
Vaishali over 6 years@Wen, Happy New Year. I answered and went back to enjoying the last day of holiday so missed the message :)