Convert Pandas series containing string to boolean

53,248

Solution 1

You can just use map:

In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
                                     'SomethingElse']})

In [8]: df
Out[8]:
          Status
0      Delivered
1      Delivered
2    Undelivered
3  SomethingElse

In [9]: d = {'Delivered': True, 'Undelivered': False}

In [10]: df['Status'].map(d)
Out[10]:
0     True
1     True
2    False
3      NaN
Name: Status, dtype: object

Solution 2

An example of replace method to replace values only in the specified column C2 and get result as DataFrame type.

import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})

  C1 C2
0  X  Y
1  Y  Y
2  X  X
3  Y  X

df.replace({'C2': {'X': True, 'Y': False}})

  C1     C2
0  X  False
1  Y  False
2  X   True
3  Y   True

Solution 3

You've got everything you need. You'll be happy to discover replace:

df.replace(d)

Solution 4

Expanding on the previous answers:

Map method explained:

  • Pandas will lookup each row's value in the corresponding d dictionary, replacing any found keys with values from d.
  • Values without keys in d will be set as NaN. This can be corrected with fillna() methods.
  • Does not work on multiple columns, since pandas operates through serialization of pd.Series here.
  • Documentation: pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)

Replace method explained:

  • Pandas will lookup each row's value in the corresponding d dictionary, and attempt to replace any found keys with values from d.
  • Values without keys in d will be be retained.
  • Works with single and multiple columns (pd.Series or pd.DataFrame objects).
  • Documentation: pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)

Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.

Share:
53,248
working4coins
Author by

working4coins

Updated on July 09, 2022

Comments

  • working4coins
    working4coins almost 2 years

    I have a DataFrame named df as

      Order Number       Status
    1         1668  Undelivered
    2        19771  Undelivered
    3    100032108  Undelivered
    4         2229    Delivered
    5        00056  Undelivered
    

    I would like to convert the Status column to boolean (True when Status is Delivered and False when Status is Undelivered) but if Status is neither 'Undelivered' neither 'Delivered' it should be considered as NotANumber or something like that.

    I would like to use a dict

    d = {
      'Delivered': True,
      'Undelivered': False
    }
    

    so I could easily add other string which could be either considered as True or False.

  • joris
    joris almost 11 years
    Ah, I only see it now I posted my answer. Is there a difference with map in this case?
  • joris
    joris almost 11 years
    It seems that something else (not in the dift) is just left with replace, but converted to NaN with map
  • Dan Allan
    Dan Allan almost 11 years
    I think map is a better choice here, actually, because if a value isn't in d then the value is invalid and should be replaced with NaN.
  • working4coins
    working4coins almost 11 years
    replace seems to apply to DataFrame not to a Serie
  • Dan Allan
    Dan Allan almost 11 years
    It applies to both. My link was to the DataFrame documentation; here's one for Series. pandas.pydata.org/pandas-docs/dev/generated/…
  • Donald Duck
    Donald Duck about 7 years
    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
  • 7H3 IN5ID3R
    7H3 IN5ID3R over 6 years
    im getting AttributeError: 'DataFrame' object has no attribute 'map'.
  • joris
    joris over 6 years
    map is a method on the Series, not DataFrame.
  • 7H3 IN5ID3R
    7H3 IN5ID3R over 6 years
    yea got it, sorry for that.