How to read a CSV file from a URL with Python?
Solution 1
You need to replace open
with urllib.urlopen or urllib2.urlopen.
e.g.
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
This would output the following
Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
...
The original question is tagged "python-2.x", but for a Python 3 implementation (which requires only minor changes) see below.
Solution 2
Using pandas it is very simple to read a csv file directly from a url
import pandas as pd
data = pd.read_csv('https://example.com/passkey=wedsmdjsjmdd')
This will read your data in tabular format, which will be very easy to process
Solution 3
You could do it with the requests module as well:
url = 'http://winterolympicsmedals.com/medals.csv'
r = requests.get(url)
text = r.iter_lines()
reader = csv.reader(text, delimiter=',')
Solution 4
To increase performance when downloading a large file, the below may work a bit more efficiently:
import requests
from contextlib import closing
import csv
url = "http://download-and-process-csv-efficiently/python.csv"
with closing(requests.get(url, stream=True)) as r:
reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"')
for row in reader:
# Handle each row here...
print row
By setting stream=True
in the GET request, when we pass r.iter_lines()
to csv.reader(), we are passing a generator to csv.reader(). By doing so, we enable csv.reader() to lazily iterate over each line in the response with for row in reader
.
This avoids loading the entire file into memory before we start processing it, drastically reducing memory overhead for large files.
Solution 5
This question is tagged python-2.x
so it didn't seem right to tamper with the original question, or the accepted answer. However, Python 2 is now unsupported, and this question still has good google juice for "python csv urllib", so here's an updated Python 3 solution.
It's now necessary to decode urlopen
's response (in bytes) into a valid local encoding, so the accepted answer has to be modified slightly:
import csv, urllib.request
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib.request.urlopen(url)
lines = [l.decode('utf-8') for l in response.readlines()]
cr = csv.reader(lines)
for row in cr:
print(row)
Note the extra line beginning with lines =
, the fact that urlopen
is now in the urllib.request
module, and print
of course requires parentheses.
It's hardly advertised, but yes, csv.reader
can read from a list of strings.
And since someone else mentioned pandas, here's a one-liner to display the CSV in a console-friendly output:
python3 -c 'import pandas
df = pandas.read_csv("http://winterolympicsmedals.com/medals.csv")
print(df.to_string())'
(Yes, it's three lines, but you can copy-paste it as one command. ;)
mongotop
BY DAY: Data Engineer. BY NIGHT: I like to hit the gym for a bit and cook some nice traditional Moroccan food. FOR FUN: camping during the weekend, watching movies in the theater and reading outdoors.
Updated on July 08, 2022Comments
-
mongotop almost 2 years
when I do curl to a API call link http://example.com/passkey=wedsmdjsjmdd
curl 'http://example.com/passkey=wedsmdjsjmdd'
I get the employee output data on a csv file format, like:
"Steve","421","0","421","2","","","","","","","","","421","0","421","2"
how can parse through this using python.
I tried:
import csv cr = csv.reader(open('http://example.com/passkey=wedsmdjsjmdd',"rb")) for row in cr: print row
but it didn't work and I got an error
http://example.com/passkey=wedsmdjsjmdd No such file or directory:
Thanks!
-
Joran Beasley almost 11 yearscan you pass that to csv_reader ? I guess so ... its pretty "file-like", but I've never done it or even thought to do that
-
Joran Beasley almost 11 yearslol I dunno that I was right I was just asking ... hadn't ever seen that done before
-
eandersson almost 11 yearsI just assumed that it worked to be honest. Which is crazy as I have used this hundred of times. :D
-
Dave Challis almost 11 yearsI think urllib2.urlopen returns a file-like object, so you can probably just remove the
.read()
, and passresponse
to thecsv.reader
. -
eandersson almost 11 yearsIt does, but at least for me I don't get the excepted output. I think its a formating issue.
-
mongotop almost 11 yearswhen I try to output the result
print cr
I get this<_csv.reader object at 0x8e3db54>
-
brbcoding almost 11 years@mongotop that means it is working... That shows you where the object is in memory. Looks like it only reads a line at a time, so maybe
cr.next()
inside a loop is what you are looking for. (haven't used csv reader myself...) -
eandersson almost 11 yearsLike @brbcoding said. I updated my example demonstrating how to display the result.
-
mongotop almost 11 yearsI got this output:
['<addinfourl at 163944620 whose fp = <socket._fileobject object at 0x9beca6c>>']
-
mongotop almost 11 yearsno I wasn't but when I did, I got an output but empty ` ['<pre> Method Not Allowed</pre></p><br/> '] ['<br/> '] ['</body>'] ['</html>'] ************************************ `
-
eandersson almost 11 yearsYou did not include the address you are trying to download the data from. It looks like your web server won't allow the request. Try the csv I included in my example. And as an alternative to urllib2 you could try requests as well docs.python-requests.org/en/latest
-
mongotop almost 11 yearsfirst of all Thanks a lot for putting a life example!!1 that is very helpfule, I tried to add csv I got this error,` response = urllib2.urlopen(NewUrlCall+'.csv',"rb").read() File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.6/urllib2.py", line 395, in open response = meth(req, response) File "/usr/lib/python2.6/urllib2.py", line 508, in http_response http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 405: Method Not Allowed `
-
mongotop almost 11 years
-
mongotop almost 11 yearsplease check the chat for more info,
-
mongotop about 8 yearsWorks like charm! Thank you for submitting you answer!
-
mongotop over 7 yearsThis is one great solution! Thank you @The Aelfinn!
-
Irvin H. about 7 yearsGreat solution, but I had to also
import codecs
and wrap ther.iter_lines()
withincodecs.iterdecode()
like so:codecs.iterdecode(r.iterlines(), 'utf-8')
... in order to solvebyte
vsstr
issues, unicode decoding problems and universal new line problems. -
Harikrishna over 6 yearsOne question. The reader variable is a _csv.reader object. When i iterate through this object to print the contents, I get the following error. Error: iterator should return strings, not bytes (did you open the file in text mode?). How do i read contents of the csvreader object and say load it to a pandas dataframe?
-
Michal Skop about 6 years@Harikrishna this is probably problem in Python 3 and this case is answered here: stackoverflow.com/questions/18897029/…
-
Jawairia almost 6 yearsThis is one of the simplest approach I have come across so far!
-
JeffHeaton about 5 yearsSo long as your CSV file fits into memory, this is okay.
-
Agustin Barrachina about 4 yearsDidn't work for me, maybe I ran out of memory.
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 33, saw 2
-
mongotop almost 4 yearsThank you @ThedudeAbides for providing an updated solution!
-
Dinero over 3 yearsis there anyway to use this with a retry, many times i get a 500 error and when i read_csv again it works. this happens a lot when i am reading from google sheets
-
Save over 3 yearsI was looking for a solution like this, with requests.
-
TheDudeAbides about 3 yearsJust want to add that
import pandas
alone will be an order of magnitude slower than any other solution on this page. So don't gopip install pandas
JUST because you see that you can do a cool one-liner with it; it also brings in numpy as a dependency, and it's all downhill from there. Same goes forimport requests
, although not to such a degree. -
mit over 2 yearsI like this solution a lot
-
MiKK about 2 yearsWith python 3.8: Exception has occurred: AttributeError module 'pandas' has no attribute 'describe'