Import data into Google Colaboratory

318,293

Solution 1

An official example notebook demonstrating local file upload/download and integration with Drive and sheets is available here: https://colab.research.google.com/notebooks/io.ipynb

The simplest way to share files is to mount your Google Drive.

To do this, run the following in a code cell:

from google.colab import drive
drive.mount('/content/drive')

It will ask you to visit a link to ALLOW "Google Files Stream" to access your drive. After that a long alphanumeric auth code will be shown that needs to be entered in your Colab's notebook.

Afterward, your Drive files will be mounted and you can browse them with the file browser in the side panel.

enter image description here

Here's a full example notebook

Solution 2

Upload

from google.colab import files
files.upload()

Download

files.download('filename')

List directory

files.os.listdir()

Solution 3

step 1- Mount your Google Drive to Collaboratory

from google.colab import drive
drive.mount('/content/gdrive')

step 2- Now you will see your Google Drive files in the left pane (file explorer). Right click on the file that you need to import and select çopy path. Then import as usual in pandas, using this copied path.

import pandas as pd
df=pd.read_csv('gdrive/My Drive/data.csv')

Done!

Solution 4

Simple way to import data from your googledrive - doing this save people time (don't know why google just doesn't list this step by step explicitly).

INSTALL AND AUTHENTICATE PYDRIVE

     !pip install -U -q PyDrive ## you will have install for every colab session

     from pydrive.auth import GoogleAuth
     from pydrive.drive import GoogleDrive
     from google.colab import auth
     from oauth2client.client import GoogleCredentials

     # 1. Authenticate and create the PyDrive client.
     auth.authenticate_user()
     gauth = GoogleAuth()
     gauth.credentials = GoogleCredentials.get_application_default()
     drive = GoogleDrive(gauth)

UPLOADING

if you need to upload data from local drive:

    from google.colab import files

    uploaded = files.upload()

    for fn in uploaded.keys():
       print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

execute and this will display a choose file button - find your upload file - click open

After uploading, it will display:

    sample_file.json(text/plain) - 11733 bytes, last modified: x/xx/2018 - %100 done
    User uploaded file "sample_file.json" with length 11733 bytes

CREATE FILE FOR NOTEBOOK

If your data file is already in your gdrive, you can skip to this step.

Now it is in your google drive. Find the file in your google drive and right click. Click get 'shareable link.' You will get a window with:

    https://drive.google.com/open?id=29PGh8XCts3mlMP6zRphvnIcbv27boawn

Copy - '29PGh8XCts3mlMP6zRphvnIcbv27boawn' - that is the file ID.

In your notebook:

    json_import = drive.CreateFile({'id':'29PGh8XCts3mlMP6zRphvnIcbv27boawn'})

    json_import.GetContentFile('sample.json') - 'sample.json' is the file name that will be accessible in the notebook.

IMPORT DATA INTO NOTEBOOK

To import the data you uploaded into the notebook (a json file in this example - how you load will depend on file/data type - .txt,.csv etc. ):

    sample_uploaded_data = json.load(open('sample.json'))

Now you can print to see the data is there:

    print(sample_uploaded_data)

Solution 5

The simplest way I've made is :

  1. Make repository on github with your dataset
  2. Clone Your repository with ! git clone --recursive [GITHUB LINK REPO]
  3. Find where is your data ( !ls command )
  4. Open file with pandas as You do it in normal jupyter notebook.
Share:
318,293

Related videos on Youtube

Grae
Author by

Grae

Another internet denizen.

Updated on December 18, 2021

Comments

  • Grae
    Grae over 2 years

    What are the common ways to import private data into Google Colaboratory notebooks? Is it possible to import a non-public Google sheet? You can't read from system files. The introductory docs link to a guide on using BigQuery, but that seems a bit... much.

  • Bob Smith
    Bob Smith over 6 years
    A sheets example is now included in a bundled example notebook that also includes recipes for Drive and Google Cloud Storage: colab.research.google.com/notebook#fileId=/v2/external/…
  • 5agado
    5agado about 6 years
    It is worth pointing out that the UPLOADING suggestion, via google.colab.files.upload() doesn't seem to work on neither Firefox nor Safari, Chrome only. See here
  • Karan Singh
    Karan Singh about 6 years
  • Grae
    Grae almost 6 years
    It's important to note that while secret gists are difficult to discover they are not private, so anyone using this approach should be careful.
  • RodrikTheReader
    RodrikTheReader almost 6 years
    Are the uploaded files stored on user's google drive or the server to which the notebook is connected?
  • fabda01
    fabda01 over 5 years
    Can I import a specific folder in my Drive? I'm sharing this colab with someone else, and I don't want to give access to all my google drive which contains sensitive information
  • Bob Smith
    Bob Smith over 5 years
    Files in your Drive won't be shared if you share the notebook. The user will still need to mount their own drive, which is separate. You can share the files with that user if needed, but all of that is controlled by normal Drive ACLs. Sharing a Colab notebook shares only the notebook, not the Drive files referenced in that notebook.
  • Asclepius
    Asclepius over 5 years
    Aren't these files ephemeral?
  • Swapnil B.
    Swapnil B. over 5 years
    my mount is successful but I can't see the files listing in the left side under files. Any suggestions?
  • Bob Smith
    Bob Smith over 5 years
    Did you hit 'Refresh' in the file browser? Did you mount under /content?
  • user25004
    user25004 over 5 years
    Any argument for upload?
  • Parseltongue
    Parseltongue over 5 years
    In this case what is "drive_dir_ID?"
  • Jean-Christophe
    Jean-Christophe over 5 years
    As mentioned in the git repo, drive_dir_ID is the corresponding Google Drive ID of the requested directory. For more info, please check github.com/ruelj2/Google_drive. There is also a clear exemple of usage.
  • saurabheights
    saurabheights over 5 years
    Do not train on the data in mounted google drive. First copy the data to local drive and then train on it. It will be nearly 10 times faster. For faster copy, make sure the data files are big archives or a number of smaller ones. For example:- Do not use 100000 image files. Use 100 archives of 1000 images each. This way uploading to google drive is also faster and so is the copying from google drive to colab
  • Vivek Solanki
    Vivek Solanki about 5 years
    Make sure you have uploaded directly to root directory and not in 'sample_data ' directory. Also, you can remove "content" and just write file name like: pd.read_csv('Forbes2015.csv');
  • Vivek Solanki
    Vivek Solanki about 5 years
    If still doesn't work, can you tell me the error message?
  • Elroch
    Elroch over 4 years
    Wins on clarity and brevity and has equal effectiveness. I see no advantage to the much more involved ways to do this.
  • Vivek Solanki
    Vivek Solanki about 4 years
    @flashliquid Not necessary. It works even without '/'. You can test it on colab.
  • Fernando Wittmann
    Fernando Wittmann about 4 years
    this answer should be at the top. The question is about importing data, not mounting google drive.

Related