Getting Google Spreadsheet CSV into A Pandas Dataframe

python pandas google-sheets google-drive-api google-apps

52,289

Solution 1

You can use read_csv() on a StringIO object:

from io import BytesIO

import requests
import pandas as pd

r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
    
In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

    Inventory_exp  Price_exp  Credit_exp  
0               2          3           3  
10              4          4           3  
12              2          2           3  
38              3          3           2  
44              4          4           4

Solution 2

Seems to work for me without the StringIO:

test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + 
                   '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
                   '/export?gid=0&format=csv',
                   # Set first column as rownames in data frame
                   index_col=0,
                   # Parse column values to datetime
                   parse_dates=['Quradate']
                  )
test.head(5)  # Same result as @TomAugspurger

BTW, including the ?gid= enables importing different sheets, find the gid in the URL.

Solution 3

Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.

sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'

First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:

csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')

Then we pass it to pd.read_csv, which can take a URL.

df = pd.read_csv(csv_export_url)

This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.

Solution 4

My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!

gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())

Solution 5

I have been using the following utils and it worked so far:

def load_from_gspreadsheet(sheet_name, key):
    url = 'https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}&headers=1'.format(
        key=key, sheet_name=sheet_name.replace(' ', '%20'))

    log.info('Loading google spreadsheet from {}'.format(url))

    df = pd.read_csv(url)
    return df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)

You must specify the sheet_name and the key. The key is the string you get from the url in the following path: https://docs.google.com/spreadsheets/d/{key}/edit/.

You can change the value of headers if you have more than one row for the column names but I am not sure if it still work with multi-headers.

It may brake if Google will change their APIs.

Also please bear in mind that your spreadsheet must be public, everyone with the link can read it.

View more solutions

52,289

Author by

dartdog

Open Software professional Developing Open SoftWare Co, oswco

Updated on December 03, 2021

Comments

dartdog over 2 years
I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)
```
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
```
The data ends up looking like: (1st row headers)
```
',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n
```
The native pandas code that brings in the disk resident file looks like:
```
df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])
```
A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.

Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:
```
https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id
```
see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...
moldovean about 10 years

I was looking for weeks how to import a spreadsheet into pandas. never heard of requests or StringIO libraries. Thank you!!
dartdog almost 10 years

Note the new URL format in the bottom of the original question above it is needed for the new Google spreadsheet version
ezcodr almost 10 years

To clarify "got moved around in python3 if you're using that": from io import StringIO
nealmcb almost 9 years

Thanks! But I had to use this form of google url for csv output: stackoverflow.com/a/23702001/507544
Max Ghenis about 8 years

How can one specify the sheet (i.e. #gid=x in URL)? Adding it to the URL itself after key= didn't work.
dartdog about 8 years

good points also are you using P3? Have not had a chance to recheck also nice not needing requests...
Max Ghenis about 8 years

Yes I'm using Python 3.
dartdog about 8 years

Great solution, just verified it, much nicer!
dartdog about 8 years

see solution below! from @Max Ghenis
getup8 over 7 years

Maybe just add comments as to what index_col and parse_dates do? Also, maybe this is obvious, but I think this only works if the Spreadsheet is public; I believe if it's not, you'll have to use the API.
dartdog over 6 years

Nice..The interface keeps getting cleaner!
MrValdez about 5 years

Since support for Python 2 will be removed, I updated the code to something that's forward-compatible with Python 3. Basically, used BytesIO instead of StringIO. But Max's answer below mentioned that you don't actually need the IO buffers.
RAbraham about 5 years

just to clarify, gs would be gs = gspread.authorize(credentials)
Dylan Hogg almost 4 years

Great solution. Works when a sheet is shared as "Anyone on the Internet with this link can view". Note that index_col and parse_dates arguments are optional.
diegodsp almost 4 years

This code returns a HTML page for download the csv, not the csv file from gsheet.
Md. Sabbir Ahmed over 3 years

Great Solution. I needed to remove parse_dates=['Quradate']
rsc05 about 3 years

I am getting ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2
Shaida Muhammad over 2 years

underrated but simple answer.
Sergey Belousov about 2 years

This worked for me, thanks! But this reads only the first sheet. How could I read all the sheets?