Pandas read_csv dtype specify all columns but one
Solution 1
You can first filter columns contains
value X
with boolean indexing
and then replace
:
cols = df.columns[df.columns.str.contains('X')]
df[cols] = df[cols].replace({'True': True, 'False': False})
Or if need filter column X
:
cols = df.columns[df.columns == 'X']
df[cols] = df[cols].replace({'True': True, 'False': False})
Sample:
import pandas as pd
df = pd.DataFrame({'A':['a1','a2','a3'],
'B':['b1','b2','b3'],
'C':['c1','c2','c3'],
'X':['True','False','True']})
print (df)
A B C X
0 a1 b1 c1 True
1 a2 b2 c2 False
2 a3 b3 c3 True
print (df.dtypes)
A object
B object
C object
X object
dtype: object
cols = df.columns[df.columns.str.contains('X')]
print (cols)
Index(['X'], dtype='object')
df[cols] = df[cols].replace({'True': True, 'False': False})
print (df.dtypes)
A object
B object
C object
X bool
dtype: object
print (df)
A B C X
0 a1 b1 c1 True
1 a2 b2 c2 False
2 a3 b3 c3 True
Solution 2
why not use bool()
data type. bool()
evaluates to true if a parameter is passed and the parameter is not False, None, '', or 0
if 'X' in data.columns:
try:
l = bool(data.columns['X'].replace('False', 0))
except:
l = None
data['X'] = data.apply(l, axis=1)
elaspog
Updated on July 20, 2022Comments
-
elaspog almost 2 years
I've a CSV file. Most of it's values I want to read as string, but I want to read a column as bool if the column with the given title exists..
Because the CSV file has a lots of columns, I don't want to specify on each column the datatype directly and give something like this:
data = read_csv('sample.csv', dtype={'A': str, 'B': str, ..., 'X': bool})
Is it possible to define the string type on each column but one and read an optional column as a bool at the same time?
My current solution is the following (but it's very unefficient and slow):
data = read_csv('sample.csv', dtype=str) # reads all column as string if 'X' in data.columns: l = lambda row: True if row['X'] == 'True' else False if row['X'] == 'False' else None data['X'] = data.apply(l, axis=1)
UPDATE: Sample CSV:
A;B;C;X a1;b1;c1;True a2;b2;c2;False a3;b3;c3;True
Or the same can ba without the 'X' column (because the column is optional):
A;B;C a1;b1;c1 a2;b2;c2 a3;b3;c3
-
elaspog almost 8 yearsI think applying the lambda expression slows down processing time.
-
elaspog almost 8 yearsThank you! It's much faster than my solution!
-
TheLazyScripter almost 8 yearsYes most definitely. If I were going to approach it the way you did originally then I would use an if/else statement.
if data == 'False': l = False
elif data == 'True': l = True
else: l = None
but I would use on of the answers that were provided. Assuming you have a large dataset I would time both solutions and use the fastest. Good luck -
elaspog almost 8 yearsThat solution has a very big problem.. It sets the False values to True values.. Any string with not zero length is parsed as True..
-
elaspog almost 8 yearsI wouldn't need it by default, but my data is very special (it's mixes alphanumeric numbers). That's one of the reasons why I have this limitation on some columns to keep special numeric numbers in string format.