Convert Pandas series containing string to boolean
53,248
Solution 1
You can just use map
:
In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
'SomethingElse']})
In [8]: df
Out[8]:
Status
0 Delivered
1 Delivered
2 Undelivered
3 SomethingElse
In [9]: d = {'Delivered': True, 'Undelivered': False}
In [10]: df['Status'].map(d)
Out[10]:
0 True
1 True
2 False
3 NaN
Name: Status, dtype: object
Solution 2
An example of replace
method to replace values only in the specified column C2
and get result as DataFrame
type.
import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})
C1 C2
0 X Y
1 Y Y
2 X X
3 Y X
df.replace({'C2': {'X': True, 'Y': False}})
C1 C2
0 X False
1 Y False
2 X True
3 Y True
Solution 3
You've got everything you need. You'll be happy to discover replace
:
df.replace(d)
Solution 4
Expanding on the previous answers:
Map method explained:
- Pandas will lookup each row's value in the corresponding
d
dictionary, replacing any found keys with values fromd
. - Values without keys in
d
will be set asNaN
. This can be corrected withfillna()
methods. - Does not work on multiple columns, since pandas operates through serialization of
pd.Series
here. - Documentation: pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)
Replace method explained:
- Pandas will lookup each row's value in the corresponding
d
dictionary, and attempt to replace any found keys with values fromd
. - Values without keys in
d
will be be retained. - Works with single and multiple columns (
pd.Series
orpd.DataFrame
objects). - Documentation: pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)
Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.
Author by
working4coins
Updated on July 09, 2022Comments
-
working4coins almost 2 years
I have a DataFrame named
df
asOrder Number Status 1 1668 Undelivered 2 19771 Undelivered 3 100032108 Undelivered 4 2229 Delivered 5 00056 Undelivered
I would like to convert the
Status
column to boolean (True
when Status is Delivered andFalse
when Status is Undelivered) but if Status is neither 'Undelivered' neither 'Delivered' it should be considered asNotANumber
or something like that.I would like to use a dict
d = { 'Delivered': True, 'Undelivered': False }
so I could easily add other string which could be either considered as
True
orFalse
. -
joris almost 11 yearsAh, I only see it now I posted my answer. Is there a difference with
map
in this case? -
joris almost 11 yearsIt seems that something else (not in the dift) is just left with
replace
, but converted toNaN
withmap
-
Dan Allan almost 11 yearsI think
map
is a better choice here, actually, because if a value isn't ind
then the value is invalid and should be replaced withNaN
. -
working4coins almost 11 years
replace
seems to apply to DataFrame not to a Serie -
Dan Allan almost 11 yearsIt applies to both. My link was to the DataFrame documentation; here's one for Series. pandas.pydata.org/pandas-docs/dev/generated/…
-
Donald Duck about 7 yearsWhile this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
-
7H3 IN5ID3R over 6 yearsim getting
AttributeError: 'DataFrame' object has no attribute 'map'
. -
joris over 6 years
map
is a method on the Series, not DataFrame. -
7H3 IN5ID3R over 6 yearsyea got it, sorry for that.