How to read a CSV file without using external libraries (such as Numpy, Pandas)?

26,474

Solution 1

You most likely will need a library to read a CSV file. While you could potentially open and parse the data yourself, this would be tedious and time consuming. Luckily python comes with a standard csv module that you won't have to pip install! You can read your file in like this:

import csv

with open('file.csv', 'r') as file:
    my_reader = csv.reader(file, delimiter=',')
    for row in my_reader:
        print(row)

This will show you that each row is being read in as a list. You can then process it based on index! There are other ways to read in data too as described at https://docs.python.org/3/library/csv.html one of which will create a dictionary instead of a list!

update

You linked your github for the project I took the snip

product_id,product_name,aisle_id,department_id
9327,Garlic Powder,104,13
17461,Air Chilled Organic Boneless Skinless Chicken Breasts,35,12
17668,Unsweetened Chocolate Almond Breeze Almond Milk,91,16
28985,Michigan Organic Kale,83,4
32665,Organic Ezekiel 49 Bread Cinnamon Raisin,112,3
33120,Organic Egg Whites,86,16
45918,Coconut Butter,19,13
46667,Organic Ginger Root,83,4
46842,Plain Pre-Sliced Bagels,93,3

Saved it as file.csv and ran it with the above code I posted. Result:

['product_id', 'product_name', 'aisle_id', 'department_id']
['9327', 'Garlic Powder', '104', '13']
['17461', 'Air Chilled Organic Boneless Skinless Chicken Breasts', '35', '12']
['17668', 'Unsweetened Chocolate Almond Breeze Almond Milk', '91', '16']
['28985', 'Michigan Organic Kale', '83', '4']
['32665', 'Organic Ezekiel 49 Bread Cinnamon Raisin', '112', '3']
['33120', 'Organic Egg Whites', '86', '16']
['45918', 'Coconut Butter', '19', '13']
['46667', 'Organic Ginger Root', '83', '4']
['46842', 'Plain Pre-Sliced Bagels', '93', '3']

This does what you have asked in your question. I am not going to do your project for you, you should be able to work it from here.

Solution 2

Recently I got a very similar question that was made more complicated than this one on making a data structure without using pandas. This is the only relevant question I have found so far. If I take this question, then what I was asked was: put the product id as keys to a dictionary and then put list of tuples of aisle and department ids as values (in python). The dictionary is the required dataframe. Of course I could not do it in 15 min (rather in 2 hours). It is hard for me to think of outside of numpy and pandas.

I have the following solutions, which also answers this question in the beginning. Probably not ideal but got what I needed.
Hopefully this helps too.

import csv
file =  open('data.csv', 'r')
reader = csv.reader(file)

items = []  # put the rows in csv to a list
aisle_dept_id = []  # to have a tuple of aisle and dept ids
mydict = {} # porudtc id as keys and list of above tuple as values in a dictionary

product_id, aisle_id, department_id, product_name = [], [], [], []

for row in reader:
    items.append(row)

for i  in range(1, len(items)):
    product_id.append(items[i][0])
    aisle_id.append(items[i][1])
    department_id.append(items[i][2])
    product_name.append(items[i][3])

for item1, item2 in zip(aisle_id, department_id):
    aisle_dept_id.append((item1, item2))
for item1, item2 in zip(product_id, aisle_dept_id):
    mydict.update({item1: [item2]})

With the output,

mydict:
{'9327': [('104', '13')],
 '17461': [('35', '12')],
 '17668': [('91', '16')],
 '28985': [('83', '4')],
 '32665': [('112', '3')],
 '33120': [('86', '16')],
 '45918': [('19', '13')],
 '46667': [('83', '4')],
 '46842': [('93', '3')]}

Solution 3

When one's production environment is limited by memory, being able to read and manage data without importing additional libraries may be helpful.

In order to achieve that, the built in csv module does the work.

import csv

There are at least two ways one might do that: using csv.Reader() or using csv.DictReader().

csv.Reader() allows you to access CSV data using indexes and is ideal for simple CSV files (Source).

csv.DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files (Source).

Here's how to do it with csv.Reader()

>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

Here's how to do it with csv.DictReader()

>>> import csv
>>> with open('names.csv', newline='') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         print(row['first_name'], row['last_name'])
...
Eric Idle
John Cleese

>>> print(row)
{'first_name': 'John', 'last_name': 'Cleese'}

For another example, check Real Python's page here.

Share:
26,474
Mosali HarshaVardhan Reddy
Author by

Mosali HarshaVardhan Reddy

Updated on July 09, 2022

Comments

  • Mosali HarshaVardhan Reddy
    Mosali HarshaVardhan Reddy almost 2 years

    This is a question that usually appears in interviews.

    I know how to read csv files using Pandas.

    However I am struggling to find a way to read files without using external libraries.

    Does Python come with any module that would help read csv files?

  • Mosali HarshaVardhan Reddy
    Mosali HarshaVardhan Reddy about 5 years
    What if I am supposed to use only Input and Output Libraries. Can I use an import CSV library?
  • Reedinationer
    Reedinationer about 5 years
    @MosaliHarshaVardhanReddy What do you mean by "Input and Output Libraries"? csv comes with a csv.reader() and csv.writer() method. Does this make it qualify as an "Input and Output Library"?
  • Mosali HarshaVardhan Reddy
    Mosali HarshaVardhan Reddy about 5 years
    Instead of using the CSV reader. I may have to use the file.reader("file.csv") and convert it into a DataFrame
  • Reedinationer
    Reedinationer about 5 years
    I am confused. You want a DataFrame, but you refuse to use numpy. I don't think you can have it both ways...DataFrames are numpy specific as far as I'm aware.
  • Reedinationer
    Reedinationer about 5 years
    @MosaliHarshaVardhanReddy So you are saying it is a requirement for you to parse the data yourself? And you are not allowed to use even standard Python library modules? I guess to make a dataframe the best you could do is make a list of lists
  • Mosali HarshaVardhan Reddy
    Mosali HarshaVardhan Reddy about 5 years
    Yeah, I need to make a list of lists and then map them with the corresponding values. I am able to make some progress. I will post a GitHub link after I do my analysis in the comment. Thanks for the help.
  • Reedinationer
    Reedinationer about 5 years
    @MosaliHarshaVardhanReddy I would truly urge you to use the csv module unless specified otherwise (which in your post you say only numpy and pandas are excluded). Then you can either make an sql database using sqlite3 or make a list of lists or a list of dictionaries to represent your data for analysis. I see no reason you should not be able to import anything at all. If that is the case though you're in for a helluva hard project that will be tedious and time consuming and neglect the best part of python: not having to reinvent the wheel with each program
  • Reedinationer
    Reedinationer about 5 years
    @MosaliHarshaVardhanReddy Nice job. Good luck on your interview then!