How can I map True/False to 1/0 in a Pandas DataFrame?

266,609

Solution 1

A succinct way to convert a single column of boolean values to a column of integers 1 or 0:

df["somecolumn"] = df["somecolumn"].astype(int)

Solution 2

Just multiply your Dataframe by 1 (int)

[1]: data = pd.DataFrame([[True, False, True], [False, False, True]])
[2]: print data
          0      1     2
     0   True  False  True
     1   False False  True

[3]: print data*1
         0  1  2
     0   1  0  1
     1   0  0  1

Solution 3

True is 1 in Python, and likewise False is 0*:

>>> True == 1
True
>>> False == 0
True

You should be able to perform any operations you want on them by just treating them as though they were numbers, as they are numbers:

>>> issubclass(bool, int)
True
>>> True * 5
5

So to answer your question, no work necessary - you already have what you are looking for.

* Note I use is as an English word, not the Python keyword is - True will not be the same object as any random 1.

Solution 4

This question specifically mentions a single column, so the currently accepted answer works. However, it doesn't generalize to multiple columns. For those interested in a general solution, use the following:

df.replace({False: 0, True: 1}, inplace=True)

This works for a DataFrame that contains columns of many different types, regardless of how many are boolean.

Solution 5

You also can do this directly on Frames

In [104]: df = DataFrame(dict(A = True, B = False),index=range(3))

In [105]: df
Out[105]: 
      A      B
0  True  False
1  True  False
2  True  False

In [106]: df.dtypes
Out[106]: 
A    bool
B    bool
dtype: object

In [107]: df.astype(int)
Out[107]: 
   A  B
0  1  0
1  1  0
2  1  0

In [108]: df.astype(int).dtypes
Out[108]: 
A    int64
B    int64
dtype: object
Share:
266,609

Related videos on Youtube

Simon Righley
Author by

Simon Righley

Updated on July 08, 2022

Comments

  • Simon Righley
    Simon Righley almost 2 years

    I have a column in python pandas DataFrame that has boolean True/False values, but for further calculations I need 1/0 representation. Is there a quick pandas/numpy way to do that?

    • Jon Clements
      Jon Clements almost 11 years
      What further calculations are required?
    • cs95
      cs95 almost 4 years
      To parrot @JonClements, why do you need to convert bool to int to use in calculation? bool works with arithmetic directly (since it is internally an int).
    • sql_knievel
      sql_knievel over 2 years
      @cs95 - Pandas uses numpy bools internally, and they can behave a little differently. In plain Python, True + True = 2, but in Pandas, numpy.bool_(True) + numpy.bool_(True) = True, which may not be the desired behavior on your particular calculation.
  • jorgeca
    jorgeca almost 11 years
    Just be careful with data types if doing floating point math: np.sin(True).dtype is float16 for me.
  • dwanderson
    dwanderson over 7 years
    I've got a dataframe with a boolean column, and I can call df.my_column.mean() just fine (as you imply), but when I try: df.groupby("some_other_column").agg({"my_column":"mean"}) I get DataError: No numeric types to aggregate, so it appears they are NOT always the same. Just FYI.
  • BallpointBen
    BallpointBen about 5 years
    In pandas version 24 (and maybe earlier) you can aggregate bool columns just fine.
  • Amadou Kone
    Amadou Kone about 5 years
    It looks like numpy also throws errors with boolean types: TypeError: numpy boolean subtract, the -` operator, is deprecated, use the bitwise_xor, the ^ operator, or the logical_xor function instead.` Using @User's answer fixes this.
  • colorlace
    colorlace almost 5 years
    Another reason it's not the same: df.col1 + df.col2 + df.col3 doesn't work for bool columns as it does for int columns
  • DustByte
    DustByte over 4 years
    The corner case is if there are NaN values in somecolumn. Using astype(int) will then fail. Another approach, which converts True to 1.0 and False to 0.0 (floats) while preserving NaN-values is to do: df.somecolumn = df.somecolumn.replace({True: 1, False: 0})
  • Homunculus Reticulli
    Homunculus Reticulli about 4 years
    @DustByte Good catch!
  • AMC
    AMC about 4 years
    @DustByte Couldn't you just use astype(float) and get the same result?
  • AMC
    AMC about 4 years
    What are the advantages of this solution?
  • AMC
    AMC about 4 years
    This is identical to this solution, posted 3 years earlier.
  • Golden Lion
    Golden Lion over 3 years
    if the value is text and a lowercase "true" or "false" then first do a astype(bool].astype(int) and the conversion will work. Sas outputs is bools as lowercase true and false.
  • Phillip Copley
    Phillip Copley over 3 years
    @AMC There are none, it's a hacky way to do it.
  • AMC
    AMC over 3 years
    Much simpler: df['type'] = df['type'].map({'REAL': 1, 'FAKE': 0}). In any case, I'm not sure it's too relevant to this question.
  • kaishu
    kaishu over 3 years
    Thanks for providing simpler solution. As I mentioned in answer, I was trying to find solution for slightly different question, and only similar questions like this were available. Hope my answer and your solution will help someone in future.
  • AMC
    AMC over 3 years
    There are other questions which already cover that, though, like stackoverflow.com/q/20250771.
  • Dmitriy Work
    Dmitriy Work about 3 years
    @AMC if your dataframe has float types beside booleans this method won't ruin them, df.astype(int) does. And since it's hacky it's probably a good idea to make intention clear with comment like # bool -> int.
  • Dmitriy Work
    Dmitriy Work about 3 years
    There is an advantage of using data * 1 against data + 0 with mixed types – it works on strings as well, where data + 0 throws an error. Equivalent performance-wise.
  • unaied
    unaied about 3 years
    how can this be applied to a number of columns?
  • Avv
    Avv almost 3 years
    Thank you. Should I do this to all columns or there is a command without specifying column name?
  • qwr
    qwr over 2 years
    advantage: slightly shorter