Engines in Python Pandas read_csv

13,388

The pd.read_csv documentation notes specific differences between 'c' (default) and 'python' engines. The names indicate the language in which the parsers are written. Specifically, the docs note:

Where possible pandas uses the C parser (specified as engine='c'), but may fall back to Python if C-unsupported options are specified.

Here are the main differences you should note (as of v0.23.4):

  • 'c' is faster, while 'python' is currently more feature-complete.
  • 'python' supports skipfooter, while 'c' does not.
  • 'python' supports flexible sep other than a single character (inc regex), while 'c' does not.
  • 'python' supports sep=None with delim_whitespace=False, which means it can auto-detect a delimiter, while 'c' does not.
  • 'c' supports float_precision, while 'python' does not (or not necessary).

Version notes:

  • dtype supported in 'python' v0.20.0+.
  • delim_whitespace supported in 'python' v0.18.1+.

Note the above may change as features are developed. You should check IO Tools (Text, CSV, HDF5, …) if you see unexpected behaviour in later versions.

Share:
13,388

Related videos on Youtube

PUNEET AGARWAL
Author by

PUNEET AGARWAL

Updated on October 01, 2022

Comments

  • PUNEET AGARWAL
    PUNEET AGARWAL about 2 months

    In the document for pd.read_csv() method in pandas in python while describing the "sep" parameter there is a mention of engines such as C engine and Python engine.

    The document link is : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

    What are these engines? What is the role of each engine? Is there any analogy which can help understand these engines better?

  • seralouk
    seralouk over 3 years
    For a 1.2 GB csv file, engine='python' is much faster than c. Why is that?
  • jpp
    jpp over 3 years
    @serafeim, Without your CSV file, it's difficult to tell. Perhaps there is specific content or combination or arguments where engine='python' is more efficient. Generally, though, 'c' is more efficient while 'python' is more feature-complete.
  • seralouk
    seralouk over 3 years
    Here is the file: filebin.net/fkyil2m5yhvr1dbh any tip would be great. c takes forever whereas python is faster