pandas: convert multiple columns to string

15,203

Solution 1

df = pd.DataFrame({
    'a': [23.0, 51.0, np.nan, 24.0],
    'b': ["a42", "3", np.nan, "a1"],
    'c': [142.0, 12.0, np.nan, np.nan]})

for col in df:
    df[col] = [np.nan if (not isinstance(val, str) and np.isnan(val)) else 
               (val if isinstance(val, str) else str(int(val))) 
               for val in df[col].tolist()]

>>> df
     a    b    c
0   23  a42  142
1   51    3   12
2  NaN  NaN  NaN
3   24   a1  NaN

>>> df.values
array([['23', 'a42', '142'],
       ['51', '3', '12'],
       [nan, nan, nan],
       ['24', 'a1', nan]], dtype=object)

Solution 2

This gives you the list of column names

lst = list(df)

This converts all the columns to string type

df[lst] = df[lst].astype(str)

Solution 3

You could apply .astype() function on every elements of dataframe, or could select the column of interest to convert to string by following ways too.

In [41]: df1 = pd.DataFrame({
    ...:     'a': [23.0, 51.0, np.nan, 24.0],
    ...:     'b': ["a42", "3", np.nan, "a1"],
    ...:     'c': [142.0, 12.0, np.nan, np.nan]})
    ...: 

In [42]: 

In [42]: df1
Out[42]: 
      a    b      c
0  23.0  a42  142.0
1  51.0    3   12.0
2   NaN  NaN    NaN
3  24.0   a1    NaN

### Shows current data type of the columns:
In [43]: df1.dtypes
Out[43]: 
a    float64
b     object
c    float64
dtype: object

### Applying .astype() on each element of the dataframe converts the datatype to string
In [45]: df1.astype(str).dtypes
Out[45]: 
a    object
b    object
c    object
dtype: object

### Or, you could select the column of interest to convert it to strings
In [48]: df1[["a", "b", "c"]] = df1[["a","b", "c"]].astype(str)

In [49]: df1.dtypes ### Datatype update
Out[49]: 
a    object
b    object
c    object
dtype: object
Share:
15,203
As3adTintin
Author by

As3adTintin

I work as an Education Data Analyst

Updated on July 12, 2022

Comments

  • As3adTintin
    As3adTintin almost 2 years

    I have some columns ['a', 'b', 'c', etc.] (a and c are float64 while b is object)

    I would like to convert all columns to string and preserve nans.

    Tried using df[['a', 'b', 'c']] == df[['a', 'b', 'c']].astype(str) but that left blanks for the float64 columns.

    Currently I am going through one by one with the following:

    df['a'] = df['a'].apply(str)
    df['a'] = df['a'].replace('nan', np.nan)
    

    Is the best way to use .astype(str) and then replace '' with np.nan? Side question: is there a difference between .astype(str) and .apply(str)?

    Sample Input: (dtypes: a=float64, b=object, c=float64)

    a, b, c, etc.
    23, 'a42', 142, etc.
    51, '3', 12, etc.
    NaN, NaN, NaN, etc.
    24, 'a1', NaN, etc.
    

    Desired output: (dtypes: a=object, b=object, c=object)

    a, b, c, etc.
    '23', 'a42', '142', etc.
    '51', 'a3', '12', etc.
    NaN, NaN, NaN, etc.
    '24', 'a1', NaN, etc.
    
  • As3adTintin
    As3adTintin almost 8 years
    thanks! so that basically goes through each column, and leaves a np.nan if it is not a string and missing, and otherwise converts the value to string (if i am correct). great! do you know how to get rid of the .0s too?
  • Alexander
    Alexander almost 8 years
    The columns are converted to floats because of the np.nan. I'll add something to convert to ints.
  • Brainless
    Brainless almost 3 years
    Why does df1.astype(str).dtypes show only object types?