Is there a way to convert a zip to a tar without extracting it to the filesystem?

27,930

Solution 1

This is now available as installable command from PyPI, see the end of this post.


I don't know of any "standard" utility that does so, but when I needed this functionality I wrote the following Python script to go from ZIP to Bzip2 compressed tar archives without extracting anything to disk first:

#! /usr/bin/env python

"""zip2tar """

import sys
import os
from zipfile import ZipFile
import tarfile
import time

def main(ifn, ofn):
    with ZipFile(ifn) as zipf:
        with tarfile.open(ofn, 'w:bz2') as tarf:
            for zip_info in zipf.infolist():
                #print zip_info.filename, zip_info.file_size
                tar_info = tarfile.TarInfo(name=zip_info.filename)
                tar_info.size = zip_info.file_size
                tar_info.mtime = time.mktime(list(zip_info.date_time) +
                                         [-1, -1, -1])
                tarf.addfile(
                    tarinfo=tar_info,
                    fileobj=zipf.open(zip_info.filename)
                )

input_file_name = sys.argv[1]
output_file_name = os.path.splitext(input_file_name)[0] + '.tar.bz2'

main(input_file_name, output_file_name)

Just save it to zip2tar and make it executable or save it to zip2tar.py and run python zip2tar.py. Provide the ZIP filename as an argument to the script, the output filename for xyz.zip will be xyz.tar.bz2.

The Bzip2 compressed output is normally much smaller than the zip file because the latter doesn't use compression patterns over multiple files, but there is also less chance of recovering later file if something in the Bzip2 file is wrong.

If you don't want the output compressed, remove :bz2 and .bz2 from the code.


If you have pip installed in a python3 environment, you can do:

pip3 install ruamel.zip2tar

to get a zip2tar commandline utility doing the above (disclaimer: I am the author of that package).

Solution 2

The tar command deals with file systems. It's input is a list of files that it then reads from a file system (including a lot of metadata). You would need to present the zip file as a file system for the tar command to read it.

A Virtual File System - AVFS will allow any program to look inside archived or compressed files via a standard file system interface via FUSE.

There's some detailed information in the avfs-fuse readme and some distributions have packages for it.

One you have AVFS installed, then you can

mountavfs
cd ~/.avfs/path/to/somefile.zip#
tar -cvf /path/whatever.tar .

AVFS will fill in any information for the file system that is missing from the zip, like file ownership, that tar will pick up.

Solution 3

Linux has a great set of tools that work through stdin and stdout through pipes.

unzip -p ./fzs-2015-03-18.zip | bzip2 > fzs-2015-03-18.bz

Check if a temporary file has been created

ps -ef | grep unzip
auser      44260    6666  3 11:18 pts/2    00:00:02 unzip -p ./fzs-2015-03-18.zip
auser      44434   44370  0 11:19 pts/1    00:00:00 grep --color=auto unzip


lsof -p 44260
COMMAND   PID  USER   FD   TYPE DEVICE  SIZE/OFF    NODE NAME
unzip   44260 auser  cwd    DIR  259,6      4096 3015712 /home/auser/Documents/shares/logs
unzip   44260 auser  rtd    DIR  259,5      4096       2 /
unzip   44260 auser  txt    REG  259,5    178072  680357 /usr/bin/unzip
unzip   44260 auser  mem    REG  259,5   3040368  744942 /usr/lib/locale/locale-archive
unzip   44260 auser  mem    REG  259,5   2146832  666811 /usr/lib/libc-2.31.so
unzip   44260 auser  mem    REG  259,5     74440  751069 /usr/lib/libbz2.so.1.0.8
unzip   44260 auser  mem    REG  259,5    203056  665072 /usr/lib/ld-2.31.so
unzip   44260 auser    0u   CHR  136,2       0t0       5 /dev/pts/2
unzip   44260 auser    1w  FIFO   0,13       0t0  436437 pipe
unzip   44260 auser    2u   CHR  136,2       0t0       5 /dev/pts/2
unzip   44260 auser    3r   REG  259,6 513348882 3015900 /home/auser/Documents/shares/logs/fzs-2015-03-18.zip



ps -ef | grep bzip2
auser      44262    6666 99 11:18 pts/2    00:06:42 bzip2
auser      45111   44370  0 11:25 pts/1    00:00:00 grep --color=auto bzip2

⟩ lsof -p 44262
COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
bzip2   44262 auser  cwd    DIR  259,6     4096 3015712 /home/auser/Documents/shares/logs
bzip2   44262 auser  rtd    DIR  259,5     4096       2 /
bzip2   44262 auser  txt    REG  259,5    38744  655763 /usr/bin/bzip2
bzip2   44262 auser  mem    REG  259,5  2146832  666811 /usr/lib/libc-2.31.so
bzip2   44262 auser  mem    REG  259,5    74440  751069 /usr/lib/libbz2.so.1.0.8
bzip2   44262 auser  mem    REG  259,5   203056  665072 /usr/lib/ld-2.31.so
bzip2   44262 auser    0r  FIFO   0,13      0t0  436437 pipe
bzip2   44262 auser    1w   REG  259,6 97325056 3015902 /home/auser/Documents/shares/logs/fzs-2015-03-18.bz
bzip2   44262 auser    2u   CHR  136,2      0t0       5 /dev/pts/2

Just the pipe |, 436437 pipe

Super simple.

You can replace bzip2 with gzip or some other utill that accepts a piped input via stdin

This zip file contains thousands of text files.

Solution 4

Here’s a small snippet that converts a ZIP archive to a matching TAR.GZ archive OnTheFly.

Convert ZIP archive to TAR archive on the fly

# File: zip2tar.py
#
# Convert ZIP archive to TAR.GZ archive.
#
# Written by Fredrik Lundh, March 2005.

# helpers (tweak as necessary)

def getuser():
    # return user name and user id
    return "anonymous", 1000

def getmode(name, data):
    # return mode ("b" or "t") for the given file.
    # you can do this either by inspecting the name, or
    # the actual data (e.g. by looking for non-ascii, non-
    # line-feed data).
    return "t" # assume everything's text, for now

#
# main

import tarfile
import zipfile

import glob, os, StringIO, sys, time

now = time.time()

user = getuser()

def fixup(infile):

    file, ext = os.path.splitext(infile)

    outfile = file + ".tar.gz"
    dirname = os.path.basename(file)

    print outfile

    zip = zipfile.ZipFile(infile, "r")

    tar = tarfile.open(outfile, "w:gz")
    tar.posix = 1

    for name in zip.namelist():

        if name.endswith("/"):
            continue

        data = zip.read(name)
        if getmode(name, data) == "t":
            data = data.replace("\r\n", "\n")

        tarinfo = tarfile.TarInfo()
        tarinfo.name = name
        tarinfo.size = len(data)
        tarinfo.mtime = now
        tarinfo.uname = tarinfo.gname = user[0]
        tarinfo.uid = tarinfo.gid = user[1]
        tar.addfile(tarinfo, StringIO.StringIO(data))

    tar.close()
    zip.close()

# convert all ZIP files in the current directory
for file in glob.glob("*.zip"):
    fixup(file)

Source

Share:
27,930

Related videos on Youtube

user253751
Author by

user253751

Updated on September 18, 2022

Comments

  • user253751
    user253751 over 1 year

    Is there a way to convert a zip archive to a tar archive without extracting to a temporary directory first? (and without writing my own implementation of tar or unzip)

    • Admin
      Admin almost 10 years
      Do you count mounting the zip archive as extracting it to the filesystem? If yes, then you can do it without extraction anything with libarchive but that involves coding.
    • Admin
      Admin almost 10 years
      I think the op looks for something like this superuser.com/questions/325504/… is it the kind of thing you are hoping to achieve?
  • Celada
    Celada almost 10 years
    Nice one. It looks like the script does not make any attempt to copy metadata such as file modification time and permissions across the archive format change, but I think you could add that quite easily.
  • user1106106
    user1106106 almost 9 years
    Exactly what I was looking for. I expected one utility like this to be available from standard unix packages. What is the license of this? I would like to propose it to be included in some packages (e.g., Debian's devutils), perhaps after some generalizations.
  • user1106106
    user1106106 almost 9 years
    Another comment: the reference to time lacks an import.
  • Anthon
    Anthon almost 9 years
    @rbrito I will post this on PyPI, any distro can pick it up from there. Just like some do with my ruamel.yaml package. Thanks for the time comment, I update the answer
  • Anthon
    Anthon almost 9 years
    @rbrito I uploaded this to bitbucket, and it is now installable from PyPI
  • user1106106
    user1106106 almost 9 years
    @Anthon, Thanks. I just read the code and saw that it needs some pep8 love (and some flexibility options (like not storing dates)). Do you want them?
  • Anthon
    Anthon almost 9 years
    @rbrito The pep8 I can do, I normally run that before doing a commit. How would you go about not storing dates, use the current date? Or the start of the era?
  • user1106106
    user1106106 almost 9 years
    @Anthon, simply omit the assignment to tar_info.mtime and we use the start of the era (I like when all the fields are set to 0, as that may give a super micro-optimization in compression rate and also serves as "anonimization").
  • user1106106
    user1106106 almost 9 years
    @Anthon, a really recommended video "beyond pep-8" (with an emphasis on the "line length" part of the video) for beautiful Python code: youtu.be/wf-BqAjZb8M
  • Anthon
    Anthon almost 9 years
    I will have a look, it is at least pep8 compatible now and zip2tar now has --no-datetime
  • user1106106
    user1106106 almost 9 years
    @Anthon, just sent you some improvements to your bitbucket account. Let's continue this talk there. I hope you like the changes. :)
  • Oskar Skog
    Oskar Skog almost 4 years
    How could this possibly work? I assume there are many files in the zip, so unless unzip -p converts the zip to an uncompressed tar, this won't work?