Download a spreadsheet from Google Docs using Python
Solution 1
In case anyone comes across this looking for a quick fix, here's another (currently) working solution that doesn't rely on the gdata client library:
#!/usr/bin/python
import re, urllib, urllib2
class Spreadsheet(object):
def __init__(self, key):
super(Spreadsheet, self).__init__()
self.key = key
class Client(object):
def __init__(self, email, password):
super(Client, self).__init__()
self.email = email
self.password = password
def _get_auth_token(self, email, password, source, service):
url = "https://www.google.com/accounts/ClientLogin"
params = {
"Email": email, "Passwd": password,
"service": service,
"accountType": "HOSTED_OR_GOOGLE",
"source": source
}
req = urllib2.Request(url, urllib.urlencode(params))
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
def get_auth_token(self):
source = type(self).__name__
return self._get_auth_token(self.email, self.password, source, service="wise")
def download(self, spreadsheet, gid=0, format="csv"):
url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
headers = {
"Authorization": "GoogleLogin auth=" + self.get_auth_token(),
"GData-Version": "3.0"
}
req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
return urllib2.urlopen(req)
if __name__ == "__main__":
import getpass
import csv
email = "" # (your email here)
password = getpass.getpass()
spreadsheet_id = "" # (spreadsheet id here)
# Create client and spreadsheet objects
gs = Client(email, password)
ss = Spreadsheet(spreadsheet_id)
# Request a file-like object containing the spreadsheet's contents
csv_file = gs.download(ss)
# Parse as CSV and print the rows
for row in csv.reader(csv_file):
print ", ".join(row)
Solution 2
The https://github.com/burnash/gspread library is a newer, simpler way to interact with Google Spreadsheets, rather than the old answers to this that suggest the gdata
library which is not only too low-level, but is also overly-complicated.
You will also need to create and download (in JSON format) a Service Account key: https://console.developers.google.com/apis/credentials/serviceaccountkey
Here's an example of how to use it:
import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)
docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
filename = docid + '-worksheet' + str(i) + '.csv'
with open(filename, 'wb') as f:
writer = csv.writer(f)
writer.writerows(worksheet.get_all_values())
Solution 3
You might try using the AuthSub method described in the Exporting Spreadsheets section of the documentation.
Get a separate login token for the spreadsheets service and substitue that for the export. Adding this to the get_spreadsheet
code worked for me:
import gdata.spreadsheet.service
def get_spreadsheet(key, gid=0):
# ...
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.email = gd_client.email
spreadsheets_client.password = gd_client.password
spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
spreadsheets_client.ProgrammaticLogin()
# ...
entry = gd_client.GetDocumentListEntry(uri)
docs_auth_token = gd_client.GetClientLoginToken()
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)
gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token
Notice I also used Export
, as Download
seems to give only PDF files.
Solution 4
(Jul 2016) Rephrasing with current terminology: "How do I download a Google Sheet in CSV or XLSX format from Google Drive using Python?". (Google Docs now only refers to the cloud-based word processor/text editor which doesn't provide access to Google Sheets spreadsheets.)
First, all other answers are pretty much outdated or will be, either because they use GData ("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.
Modern Google API access occurs using API keys (for accessing public data), OAuth2 client IDs (for accessing data owned by users), or service accounts (for accessing data owned by applications/in the cloud) primarily with the Google Cloud client libraries for GCP APIs and Google APIs Client Libraries for non-GCP APIs. For this task, it would be the latter for Python.
To make it happen your code needs authorized access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.
Note that there is also a newer Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based request like exporting where the Drive API is the correct one to use.
I wrote a blog post that demos exporting a Google Sheet as CSV from Drive. The core part of the script:
# setup
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
# query for file to export
files = DRIVE.files().list(
q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE), orderBy='modifiedTime desc,name').execute().get('files', [])
# export 1st match (if found)
if files:
fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
with open(fn, 'wb') as f:
f.write(data)
print('DONE')
To learn more about using Google Sheets with Python, see my answer for a similar question. You can also download a Sheet in XLSX and other formats supported by Drive.
If you're completely new to Google APIs, then you need to take a further step back and review these videos first:
- How to use Google APIs & create API projects -- the UI has changed but the concepts are still the same
- Walkthrough of authorization boilerplate code (Python) -- you can use any supported language to access Google APIs; if you don't do Python, use it as pseudocode to help get you started
- Listing your files in Google Drive and code deep dive post
If you already have experience with G Suite APIs and want to see more videos on using both APIs:
Solution 5
This no longer works as of gdata 2.0.1.4:
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
Instead, you have to do:
gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))
![a paid nerd](https://i.stack.imgur.com/oviKE.png?s=256&g=1)
a paid nerd
I'm a professional programmer. I also like riding bicycles. I haven't seen the ResEdit clown in a long time -- or in as anachronistic an environment ever. -- @ruffin 8bde9408bf59382c3194ec7c1bf6afbd837875d7
Updated on December 09, 2020Comments
-
a paid nerd over 3 years
Can you produce a Python example of how to download a Google Docs spreadsheet given its key and worksheet ID (
gid
)? I can't.I've scoured versions 1, 2 and 3 of the API. I'm having no luck, I can't figure out their compilcated ATOM-like feeds API, the
gdata.docs.service.DocsService._DownloadFile
private method says that I'm unauthorized, and I don't want to write an entire Google Login authentication system myself. I'm about to stab myself in the face due to frustration.I have a few spreadsheets and I want to access them like so:
username = '[email protected]' password = getpass.getpass() def get_spreadsheet(key, gid=0): ... (help!) ... for row in get_spreadsheet('5a3c7f7dcee4b4f'): cell1, cell2, cell3 = row ...
Please save my face.
Update 1: I've tried the following, but no combination of
Download()
orExport()
seems to work. (Docs forDocsService
here)import gdata.docs.service import getpass import os import tempfile import csv def get_csv(file_path): return csv.reader(file(file_path).readlines()) def get_spreadsheet(key, gid=0): gd_client = gdata.docs.service.DocsService() gd_client.email = '[email protected]' gd_client.password = getpass.getpass() gd_client.ssl = False gd_client.source = "My Fancy Spreadsheet Downloader" gd_client.ProgrammaticLogin() file_path = tempfile.mktemp(suffix='.csv') uri = 'http://docs.google.com/feeds/documents/private/full/%s' % key try: entry = gd_client.GetDocumentListEntry(uri) # XXXX - The following dies with RequestError "Unauthorized" gd_client.Download(entry, file_path) return get_csv(file_path) finally: try: os.remove(file_path) except OSError: pass
-
michael almost 13 yearscan you go the other way? log in with the gd_client and then pass the spreadsheet client a token?
-
MrColes about 12 yearsawesome—a nice reusable example!
-
Evan Plaice over 11 yearsThis should be the chosen solution. Removing the gdata dependency is a huge plus and you even packaged it in a way that it could be used as both a console app and a library. I added exception handling around the get_auth_token request but only so I could output a meaningful message when an incorrect password is encountered.
-
Spacedman about 11 yearsIs there a reason to loop over the
csv_file
object? Why not justprint csv_file.read()
- csv in, csv out! -
Cameron Spickert about 11 years@Spacedman the
csv.reader
bit is included mainly as an example of how to parse the spreadsheet once it's downloaded. -
aculich almost 11 yearsA better solution is now available since this answer was originally accepted.
-
a paid nerd almost 11 yearsJust checked it out. Looks good and I'll mark it as the default. Awesome!
-
Ben Sullins over 10 yearsJust used this also, great stuff! ps - you can change the method to use the doc name instead of the key which is nice.
-
d12frosted about 9 yearsAnd this just loads all worksheets. Is it possible to load worksheet by gid, not index or name?
-
Francis Davey about 9 yearsThe ClientLogin method has been withdrawn now hasn't it? So any solution would have to be OAth2 - is there any way of avoiding that complexity?
-
Moses Liao GZ over 8 yearsI use this method and I have this problem UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-6: ordinal not in range(128)
-
grasshopper about 8 yearsdevelopers.google.com/identity/protocols/… as Francis said, this doesn't work anymore.
-
Robert Bruce almost 8 yearsThis example will NO LONGER WORK, as the "ClientLogin" interface has been disabled. developers.google.com/identity/protocols/AuthForInstalledApps
-
Richard de Ree almost 8 yearsThe email auth doesn't work anymore. So this solution don't work. You need to have OAuth 2.0 google.com/…
-
Richard de Ree almost 8 yearsThis solution is outdated and won't work, like allmost all answers here you need to use oauth2.0
-
aculich almost 8 yearsthanks @Richard I updated this example to use an oauth2
-
Dmitry Dubovitsky over 5 yearsfor Python2 and non English symbol, as @MosesLiaoGZ noted before, you can use stackoverflow.com/a/51003728/1072782
-
Manthan_Admane almost 4 yearsHow did you get that link with "out:csv" attached to it?
-
Manthan_Admane almost 4 yearsDid you add it manually like answered in- stackoverflow.com/questions/24255472/…