Extract Google Drive zip from Google colab notebook
Solution 1
You can simply use this
!unzip file_location
Solution 2
TO unzip a file to a directory:
!unzip path_to_file.zip -d path_to_directory
Solution 3
To extract Google Drive zip from a Google colab notebook:
import zipfile
from google.colab import drive
drive.mount('/content/drive/')
zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()
Solution 4
Colab research team has a notebook for helping you out.
Still, in short, if you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within drive then do this --
!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"
-u
part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.
-d
creates the directory and extracted files are stored there.
Of course before doing this you need to mount your drive
from google.colab import drive
drive.mount('/content/drive')
I hope this helps! Cheers!!
Solution 5
First, install unzip on colab:
!apt install unzip
then use unzip to extract your files:
!unzip source.zip -d destination.zip
Laxmikant
Updated on November 28, 2021Comments
-
Laxmikant over 2 years
I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm. Below Code extracts the content in a string format:
from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive from google.colab import auth from oauth2client.client import GoogleCredentials import io import zipfile # Authenticate and create the PyDrive client. # This only needs to be done once per notebook. auth.authenticate_user() gauth = GoogleAuth() gauth.credentials = GoogleCredentials.get_application_default() drive = GoogleDrive(gauth) # Download a file based on its file ID. # # A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip downloaded = drive.CreateFile({'id': file_id}) #print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))
But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.
I tried to extract it further, but getting "Not a zipfile error"
dataset = io.BytesIO(downloaded.encode('cp862')) zip_ref = zipfile.ZipFile(dataset, "r") zip_ref.extractall() zip_ref.close()
Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.
-
Rafael_Espericueta almost 5 yearsIt seemed to work, but was too fast perhaps. But then unzip didn't work. Archive: flowers.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of flowers or flowers.zip, and cannot find flowers.ZIP, period.
-
Vincent Jia over 4 yearsFor the code part, please use the language-specific format ( it's Python in this context).
-
abdul over 4 yearsHave added, Thanks
-
Farzan about 4 yearsfor unzip isn't it I/O-wise better to first copy the zip file to local Colab storage then do the unzip operation?
-
Asghar Nazir over 2 years@zeeshan how can i specify output directory ?
-
Zeeshan Ali over 2 yearsSpecify it next to the
file_location
. -
Rajdeep Borgohain about 2 yearsIt doesn't work for large datasets. I have tried it on a 60GB zip file, but it didn't work out!
-
Md Hishamur Rahman about 2 yearsIt's not a problem with shutil; Colab RAM does not have the capacity to hold your 60 GB zip file in the first place.
-
Rajdeep Borgohain about 2 yearsI got 35GB ram, and I think that is sufficient. I think it doesn't load the whole file to the memory. I don't know but in my local system with 8GB I have extracted more than 50GB without any issue.