pandas: convert multiple columns to string
Solution 1
df = pd.DataFrame({
'a': [23.0, 51.0, np.nan, 24.0],
'b': ["a42", "3", np.nan, "a1"],
'c': [142.0, 12.0, np.nan, np.nan]})
for col in df:
df[col] = [np.nan if (not isinstance(val, str) and np.isnan(val)) else
(val if isinstance(val, str) else str(int(val)))
for val in df[col].tolist()]
>>> df
a b c
0 23 a42 142
1 51 3 12
2 NaN NaN NaN
3 24 a1 NaN
>>> df.values
array([['23', 'a42', '142'],
['51', '3', '12'],
[nan, nan, nan],
['24', 'a1', nan]], dtype=object)
Solution 2
This gives you the list of column names
lst = list(df)
This converts all the columns to string type
df[lst] = df[lst].astype(str)
Solution 3
You could apply .astype()
function on every elements of dataframe, or could select the column of interest to convert to string by following ways too.
In [41]: df1 = pd.DataFrame({
...: 'a': [23.0, 51.0, np.nan, 24.0],
...: 'b': ["a42", "3", np.nan, "a1"],
...: 'c': [142.0, 12.0, np.nan, np.nan]})
...:
In [42]:
In [42]: df1
Out[42]:
a b c
0 23.0 a42 142.0
1 51.0 3 12.0
2 NaN NaN NaN
3 24.0 a1 NaN
### Shows current data type of the columns:
In [43]: df1.dtypes
Out[43]:
a float64
b object
c float64
dtype: object
### Applying .astype() on each element of the dataframe converts the datatype to string
In [45]: df1.astype(str).dtypes
Out[45]:
a object
b object
c object
dtype: object
### Or, you could select the column of interest to convert it to strings
In [48]: df1[["a", "b", "c"]] = df1[["a","b", "c"]].astype(str)
In [49]: df1.dtypes ### Datatype update
Out[49]:
a object
b object
c object
dtype: object
Comments
-
As3adTintin almost 2 years
I have some columns
['a', 'b', 'c', etc.]
(a
andc
arefloat64
whileb
isobject
)I would like to convert all columns to string and preserve
nan
s.Tried using
df[['a', 'b', 'c']] == df[['a', 'b', 'c']].astype(str)
but that left blanks for thefloat64
columns.Currently I am going through one by one with the following:
df['a'] = df['a'].apply(str) df['a'] = df['a'].replace('nan', np.nan)
Is the best way to use
.astype(str)
and then replace''
withnp.nan
? Side question: is there a difference between.astype(str)
and.apply(str)
?Sample Input: (dtypes: a=float64, b=object, c=float64)
a, b, c, etc. 23, 'a42', 142, etc. 51, '3', 12, etc. NaN, NaN, NaN, etc. 24, 'a1', NaN, etc.
Desired output: (dtypes: a=object, b=object, c=object)
a, b, c, etc. '23', 'a42', '142', etc. '51', 'a3', '12', etc. NaN, NaN, NaN, etc. '24', 'a1', NaN, etc.
-
As3adTintin almost 8 yearsthanks! so that basically goes through each column, and leaves a
np.nan
if it is not a string and missing, and otherwise converts the value to string (if i am correct). great! do you know how to get rid of the.0
s too? -
Alexander almost 8 yearsThe columns are converted to floats because of the np.nan. I'll add something to convert to ints.
-
Brainless almost 3 yearsWhy does df1.astype(str).dtypes show only object types?