Extract Google Drive zip from Google colab notebook

python google-drive-api google-colaboratory zip

204,597

Solution 1

You can simply use this

!unzip file_location

Solution 2

TO unzip a file to a directory:

!unzip path_to_file.zip -d path_to_directory

Solution 3

To extract Google Drive zip from a Google colab notebook:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

Solution 4

Colab research team has a notebook for helping you out.

Still, in short, if you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within drive then do this --

!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"

-u part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.

-d creates the directory and extracted files are stored there.

Of course before doing this you need to mount your drive

from google.colab import drive 
drive.mount('/content/drive')

I hope this helps! Cheers!!

Solution 5

First, install unzip on colab:

!apt install unzip

then use unzip to extract your files:

!unzip  source.zip -d destination.zip

View more solutions

204,597

Author by

Laxmikant

Updated on November 28, 2021

Comments

Laxmikant over 2 years

I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm. Below Code extracts the content in a string format:

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
downloaded = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))

But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.

I tried to extract it further, but getting "Not a zipfile error"

dataset = io.BytesIO(downloaded.encode('cp862'))
zip_ref = zipfile.ZipFile(dataset, "r")
zip_ref.extractall()
zip_ref.close()

Google Drive Dataset

Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.

Rafael_Espericueta almost 5 years

It seemed to work, but was too fast perhaps. But then unzip didn't work. Archive: flowers.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of flowers or flowers.zip, and cannot find flowers.ZIP, period.
Vincent Jia over 4 years

For the code part, please use the language-specific format ( it's Python in this context).
abdul over 4 years

Have added, Thanks
Farzan about 4 years

for unzip isn't it I/O-wise better to first copy the zip file to local Colab storage then do the unzip operation?
Asghar Nazir over 2 years

@zeeshan how can i specify output directory ?
Zeeshan Ali over 2 years

Specify it next to the file_location.
Rajdeep Borgohain about 2 years

It doesn't work for large datasets. I have tried it on a 60GB zip file, but it didn't work out!
Md Hishamur Rahman about 2 years

It's not a problem with shutil; Colab RAM does not have the capacity to hold your 60 GB zip file in the first place.
Rajdeep Borgohain about 2 years

I got 35GB ram, and I think that is sufficient. I think it doesn't load the whole file to the memory. I don't know but in my local system with 8GB I have extracted more than 50GB without any issue.