Use python requests to download CSV
Solution 1
This should help:
import csv
import requests
CSV_URL = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'
with requests.Session() as s:
download = s.get(CSV_URL)
decoded_content = download.content.decode('utf-8')
cr = csv.reader(decoded_content.splitlines(), delimiter=',')
my_list = list(cr)
for row in my_list:
print(row)
Ouput sample:
['street', 'city', 'zip', 'state', 'beds', 'baths', 'sq__ft', 'type', 'sale_date', 'price', 'latitude', 'longitude']
['3526 HIGH ST', 'SACRAMENTO', '95838', 'CA', '2', '1', '836', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '59222', '38.631913', '-121.434879']
['51 OMAHA CT', 'SACRAMENTO', '95823', 'CA', '3', '1', '1167', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68212', '38.478902', '-121.431028']
['2796 BRANCH ST', 'SACRAMENTO', '95815', 'CA', '2', '1', '796', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68880', '38.618305', '-121.443839']
['2805 JANETTE WAY', 'SACRAMENTO', '95815', 'CA', '2', '1', '852', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '69307', '38.616835', '-121.439146']
[...]
Related question with answer: https://stackoverflow.com/a/33079644/295246
Edit: Other answers are useful if you need to download large files (i.e. stream=True
).
Solution 2
To simplify these answers, and increase performance when downloading a large file, the below may work a bit more efficiently.
import requests
from contextlib import closing
import csv
from codecs import iterdecode
url = "http://download-and-process-csv-efficiently/python.csv"
with closing(requests.get(url, stream=True)) as r:
reader = iterdecode(csv.reader(r.iter_lines(), 'utf-8'),
delimiter=',',
quotechar='"')
for row in reader:
print(row)
By setting stream=True
in the GET request, when we pass r.iter_lines()
to csv.reader(), we are passing a generator to csv.reader(). By doing so, we enable csv.reader() to lazily iterate over each line in the response with for row in reader
.
This avoids loading the entire file into memory before we start processing it, drastically reducing memory overhead for large files.
Solution 3
I like the answers from The Aelfinn and aheld. I can improve them only by shortening a bit more, removing superfluous pieces, using a real data source, making it 2.x & 3.x-compatible, and maintaining the high-level of memory-efficiency seen elsewhere:
import csv
import requests
CSV_URL = 'http://web.cs.wpi.edu/~cs1004/a16/Resources/SacramentoRealEstateTransactions.csv'
with requests.get(CSV_URL, stream=True) as r:
lines = (line.decode('utf-8') for line in r.iter_lines())
for row in csv.reader(lines):
print(row)
Too bad 3.x is less flexible CSV-wise because the iterator must emit Unicode strings (while requests
does bytes
) while the 2.x-only version—for row in csv.reader(r.iter_lines()):
—is more Pythonic (shorter and easier-to-read). Anyhow, note the 2.x/3.x solution above won't handle the situation described by the OP where a NEWLINE is found unquoted in the data read.
For the part of the OP's question regarding downloading (vs. processing) the actual CSV file, here's another script that does that, 2.x & 3.x-compatible, minimal, readable, and memory-efficient:
import os
import requests
CSV_URL = 'http://web.cs.wpi.edu/~cs1004/a16/Resources/SacramentoRealEstateTransactions.csv'
with open(os.path.split(CSV_URL)[1], 'wb') as f, \
requests.get(CSV_URL, stream=True) as r:
for line in r.iter_lines():
f.write(line+'\n'.encode())
Solution 4
You can also use the DictReader
to iterate dictionaries of {'columnname': 'value', ...}
import csv
import requests
response = requests.get('http://example.test/foo.csv')
reader = csv.DictReader(response.iter_lines())
for record in reader:
print(record)
Solution 5
I use this code (I use Python 3):
import csv
import io
import requests
url = "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv"
r = requests.get(url)
r.encoding = 'utf-8' # useful if encoding is not sent (or not sent properly) by the server
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
data.append(row)
viviwill
Updated on July 05, 2022Comments
-
viviwill almost 2 years
Here is my code:
import csv import requests with requests.Session() as s: s.post(url, data=payload) download = s.get('url that directly download a csv report')
This gives me the access to the csv file. I tried different method to deal with the download:
This will give the the csv file in one string:
print download.content
This print the first row and return error: _csv.Error: new-line character seen in unquoted field
cr = csv.reader(download, dialect=csv.excel_tab) for row in cr: print row
This will print a letter in each row and it won't print the whole thing:
cr = csv.reader(download.content, dialect=csv.excel_tab) for row in cr: print row
My question is: what's the most efficient way to read a csv file in this situation. And how to download it.
thanks
-
Irvin H. about 7 yearsI had to also
import codecs
and wrap ther.iter_lines()
withincodecs.iterdecode()
like so:codecs.iterdecode(r.iterlines(), 'utf-8')
... in order to solvebyte
vsstr
issues, unicode decoding problems and universal new line problems. -
linqu about 6 yearsThanks @IrvinH. , I ran into the same problem. btw it should be r.iter_lines() you missed the underscore.
-
JeffHeaton almost 5 yearsIs it necessary to read the entire thing into memory? This seems non-scaleable.
-
JeffHeaton almost 5 yearsOn Python 3.7 this results in: error: iterator should return strings, not bytes (did you open the file in text mode?)
-
JeffHeaton almost 5 yearsBest answer! Works great with latest version of Python.
-
wescpy almost 4 yearsTo support the widest audience, it should work with all currently-deployed versions of Python, not just the latest... thx though! :-) (min version is 2.6)
-
Addison Klinke over 3 yearsFor downloading, I think most users will want
f.write(line + '\n'.encode())
- currently your example writes one enormous line which won't be easily loaded back into memory by a CSV reader -
wescpy over 3 yearsThx for spotting that. I normally try to take advantage of the "free"
\n
that comes w/text files but neglected to recall that respectable libraries drop those for the purpose of data-scrubbing, requiring us to add them back when creating our own files w/such data. -
Yunnosch over 2 yearsWhile this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.
-
West about 2 yearsI notice this doesn't always maintain structure of csv. I'm downloading a csv with a Notes column and the notes have newlines in them. Using this solution, all newlines are ignored and its not ideal