Extract Google Drive zip from Google colab notebook

204,597

Solution 1

You can simply use this

!unzip file_location

Solution 2

TO unzip a file to a directory:

!unzip path_to_file.zip -d path_to_directory

Solution 3

To extract Google Drive zip from a Google colab notebook:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

Solution 4

Colab research team has a notebook for helping you out.

Still, in short, if you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within drive then do this --

!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"

-u part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.

-d creates the directory and extracted files are stored there.

Of course before doing this you need to mount your drive

from google.colab import drive 
drive.mount('/content/drive')

I hope this helps! Cheers!!

Solution 5

First, install unzip on colab:

!apt install unzip

then use unzip to extract your files:

!unzip  source.zip -d destination.zip
Share:
204,597
Laxmikant
Author by

Laxmikant

Updated on November 28, 2021

Comments

  • Laxmikant
    Laxmikant over 2 years

    I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm. Below Code extracts the content in a string format:

    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    from google.colab import auth
    from oauth2client.client import GoogleCredentials
    import io
    import zipfile
    # Authenticate and create the PyDrive client.
    # This only needs to be done once per notebook.
    auth.authenticate_user()
    gauth = GoogleAuth()
    gauth.credentials = GoogleCredentials.get_application_default()
    drive = GoogleDrive(gauth)
    
    # Download a file based on its file ID.
    #
    # A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
    file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
    downloaded = drive.CreateFile({'id': file_id})
    #print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))
    

    But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.

    I tried to extract it further, but getting "Not a zipfile error"

    dataset = io.BytesIO(downloaded.encode('cp862'))
    zip_ref = zipfile.ZipFile(dataset, "r")
    zip_ref.extractall()
    zip_ref.close()
    

    Google Drive Dataset

    Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.

  • Rafael_Espericueta
    Rafael_Espericueta almost 5 years
    It seemed to work, but was too fast perhaps. But then unzip didn't work. Archive: flowers.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of flowers or flowers.zip, and cannot find flowers.ZIP, period.
  • Vincent Jia
    Vincent Jia over 4 years
    For the code part, please use the language-specific format ( it's Python in this context).
  • abdul
    abdul over 4 years
    Have added, Thanks
  • Farzan
    Farzan about 4 years
    for unzip isn't it I/O-wise better to first copy the zip file to local Colab storage then do the unzip operation?
  • Asghar Nazir
    Asghar Nazir over 2 years
    @zeeshan how can i specify output directory ?
  • Zeeshan Ali
    Zeeshan Ali over 2 years
    Specify it next to the file_location.
  • Rajdeep Borgohain
    Rajdeep Borgohain about 2 years
    It doesn't work for large datasets. I have tried it on a 60GB zip file, but it didn't work out!
  • Md Hishamur Rahman
    Md Hishamur Rahman about 2 years
    It's not a problem with shutil; Colab RAM does not have the capacity to hold your 60 GB zip file in the first place.
  • Rajdeep Borgohain
    Rajdeep Borgohain about 2 years
    I got 35GB ram, and I think that is sufficient. I think it doesn't load the whole file to the memory. I don't know but in my local system with 8GB I have extracted more than 50GB without any issue.