Extract files from zip without keeping the structure using python ZipFile?

54,976

Solution 1

This opens file handles of members of the zip archive, extracts the filename and copies it to a target file (that's how ZipFile.extract works, without taking care of subdirectories).

import os
import shutil
import zipfile

my_dir = r"D:\Download"
my_zip = r"D:\Download\my_file.zip"

with zipfile.ZipFile(my_zip) as zip_file:
    for member in zip_file.namelist():
        filename = os.path.basename(member)
        # skip directories
        if not filename:
            continue
    
        # copy file (taken from zipfile's extract)
        source = zip_file.open(member)
        target = open(os.path.join(my_dir, filename), "wb")
        with source, target:
            shutil.copyfileobj(source, target)

Solution 2

It is possible to iterate over the ZipFile.infolist(). On the returned ZipInfo objects you can then manipulate the filename to remove the directory part and finally extract it to a specified directory.

import glob
import zipfile
import shutil
import os

my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"

with zipfile.ZipFile(my_zip) as zip:
    for zip_info in zip.infolist():
        if zip_info.filename[-1] == '/':
            continue
        zip_info.filename = os.path.basename(zip_info.filename)
        zip.extract(zip_info, my_dir)

Solution 3

Just extract to bytes in memory,compute the filename, and write it there yourself, instead of letting the library do it - -mostly, just use the "read()" instead of "extract()" method:

Python 3.6+ update(2020) - the same code from the original answer, but using pathlib.Path, which ease file-path manipulation and other operations (like "write_bytes")

from pathlib import Path
import zipfile
import os

my_dir = Path("D:\\Download\\")
my_zip = my_dir / "my_file.zip"

zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
    data = zip_file.read(files, my_dir)
    myfile_path = my_dir / Path(files.filename).name
    myfile_path.write_bytes(data)
zip_file.close()

Original code in answer without pathlib:

import zipfile
import os

my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"

zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
    data = zip_file.read(files, my_dir)
    # I am almost shure zip represents directory separator
    # char as "/" regardless of OS, but I  don't have DOS or Windos here to test it
    myfile_path = os.path.join(my_dir, files.split("/")[-1])
    myfile = open(myfile_path, "wb")
    myfile.write(data)
    myfile.close()
zip_file.close()

Solution 4

A similar concept to the solution of Gerhard Götz, but adapted for extracting single files instead of the entire zip:

with ZipFile(zipPath, 'r') as zipObj:
    zipInfo = zipObj.getinfo(path_in_zip))
    zipInfo.filename = os.path.basename(destination)
    zipObj.extract(zipInfo, os.path.dirname(os.path.realpath(destination)))
Share:
54,976
Thammas
Author by

Thammas

Updated on April 11, 2021

Comments

  • Thammas
    Thammas about 3 years

    I try to extract all files from .zip containing subfolders in one folder. I want all the files from subfolders extract in only one folder without keeping the original structure. At the moment, I extract all, move the files to a folder, then remove previous subfolders. The files with same names are overwrited.

    Is it possible to do it before writing files?

    Here is a structure for example:

    my_zip/file1.txt
    my_zip/dir1/file2.txt
    my_zip/dir1/dir2/file3.txt
    my_zip/dir3/file4.txt
    

    At the end I whish this:

    my_dir/file1.txt
    my_dir/file2.txt
    my_dir/file3.txt
    my_dir/file4.txt
    

    What can I add to this code ?

    import zipfile
    my_dir = "D:\\Download\\"
    my_zip = "D:\\Download\\my_file.zip"
    
    zip_file = zipfile.ZipFile(my_zip, 'r')
    for files in zip_file.namelist():
        zip_file.extract(files, my_dir)
    zip_file.close()
    

    if I rename files path from zip_file.namelist(), I have this error:

    KeyError: "There is no item named 'file2.txt' in the archive"