Engines in Python Pandas read_csv
The pd.read_csv
documentation notes specific differences between 'c' (default) and 'python' engines. The names indicate the language in which the parsers are written. Specifically, the docs note:
Where possible pandas uses the C parser (specified as
engine='c'
), but may fall back to Python if C-unsupported options are specified.
Here are the main differences you should note (as of v0.23.4):
- 'c' is faster, while 'python' is currently more feature-complete.
-
'python' supports
skipfooter
, while 'c' does not. -
'python' supports flexible
sep
other than a single character (inc regex), while 'c' does not. -
'python' supports
sep=None
withdelim_whitespace=False
, which means it can auto-detect a delimiter, while 'c' does not. -
'c' supports
float_precision
, while 'python' does not (or not necessary).
Version notes:
-
dtype
supported in 'python' v0.20.0+. -
delim_whitespace
supported in 'python' v0.18.1+.
Note the above may change as features are developed. You should check IO Tools (Text, CSV, HDF5, …) if you see unexpected behaviour in later versions.
Related videos on Youtube
PUNEET AGARWAL
Updated on October 01, 2022Comments
-
PUNEET AGARWAL over 1 year
In the document for
pd.read_csv()
method in pandas in python while describing the "sep" parameter there is a mention of engines such as C engine and Python engine.The document link is : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
What are these engines? What is the role of each engine? Is there any analogy which can help understand these engines better?
-
seralouk almost 5 yearsFor a 1.2 GB csv file,
engine='python'
is much faster thanc
. Why is that? -
jpp almost 5 years@serafeim, Without your CSV file, it's difficult to tell. Perhaps there is specific content or combination or arguments where
engine='python'
is more efficient. Generally, though,'c'
is more efficient while'python'
is more feature-complete. -
seralouk almost 5 yearsHere is the file: filebin.net/fkyil2m5yhvr1dbh any tip would be great.
c
takes forever whereaspython
is faster