Pandas: How to read specific rows from a CSV file

18,481

Solution 1

Read the entire csv and do filtering like below

my_df =  pd.read_csv("example.csv")
my_df = my_df[my_df['hits']>20]

If you are having memory issues while reading, you can set chunksize parameter to read it in chunks

Solution 2

Read the entire csv and then use query() method to select the required section :

required_df = my_df.query("hits > 20")

or,

required_df =df.loc[df['hits']>20]
Share:
18,481

Related videos on Youtube

kev
Author by

kev

Passionate about Software Engineering, Cloud Computing, Big Data Processing, Machine Learning Master's Degree in Computer Engineering from Boston University

Updated on July 02, 2022

Comments

  • kev
    kev almost 2 years

    I have a csv file example.csv like-

        name  |  hits
       ---------------
         A    |  34
         B    |  30
         C    |  25
         D    |  20
    

    Using pandas in Python, how do I only read the rows with hits > 20? Looking for something like-

    my_df = pd.read_csv('example.csv', where col('hits') > 20)

  • kev
    kev about 5 years
    So you're saying that we always need to read the entire CSV file?
  • Sociopath
    Sociopath about 5 years
    Yes. You can not filter the df based on the column value while reading it using read_csv