How to create full compressed tar file using Python?

140,054

Solution 1

To build a .tar.gz (aka .tgz) for an entire directory tree:

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir.

Solution 2

import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
    tar.add(name)
tar.close()

If you want to create a tar.bz2 compressed file, just replace file extension name with ".tar.bz2" and "w:gz" with "w:bz2".

Solution 3

You call tarfile.open with mode='w:gz', meaning "Open for gzip compressed writing."

You'll probably want to end the filename (the name argument to open) with .tar.gz, but that doesn't affect compression abilities.

BTW, you usually get better compression with a mode of 'w:bz2', just like tar can usually compress even better with bzip2 than it can compress with gzip.

Solution 4

Previous answers advise using the tarfile Python module for creating a .tar.gz file in Python. That's obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile is approximately two times slower than the tar utility in Linux. According to my experience this estimation is pretty correct.

So for faster archiving you can use the tar command using subprocess module:

subprocess.call(['tar', '-czf', output_filename, file_to_archive])

Solution 5

In addition to @Aleksandr Tukallo's answer, you could also obtain the output and error message (if occurs). Compressing a folder using tar is explained pretty well on the following answer.

import traceback
import subprocess

try:
    cmd = ['tar', 'czfj', output_filename, file_to_archive]
    output = subprocess.check_output(cmd).decode("utf-8").strip() 
    print(output)          
except Exception:       
    print(f"E: {traceback.format_exc()}")       
Share:
140,054
shahjapan
Author by

shahjapan

Passion: Computer Science, Programming, R & D, Troubleshooting Programming: Python, C#, Shell scripting Frontend Technologies: Bootstrap, Javascript, JQuery, SCSS Cloud Technologies: GCP, AWS Virtualisation: Docker, Docker-Compose, k8s RDBMS: MySQL, PostgreSQL, SQLite Operating Systems: Linux, Windows, OSX Testing: Unit Tests, pytest, Jenkins, selenium Hobbies: Playing Chess, Swimming & roaming -- Japan Shah

Updated on November 09, 2021

Comments

  • shahjapan
    shahjapan over 2 years

    How can I create a .tar.gz file with compression in Python?

  • Ignacio Vazquez-Abrams
    Ignacio Vazquez-Abrams over 14 years
    Just a quick note that the filename for bzip2-compressed tarballs should end with ".tar.bz2".
  • Brōtsyorfuzthrāx
    Brōtsyorfuzthrāx over 8 years
    Just as a note to readers, if you leave out arcname=os.path.basename(source_dir) then it'll give you the entire path structure of source_dir in the tar file (in most situations, that's probably inconvenient).
  • Jonathan H
    Jonathan H about 7 years
    You should really use with tarfile.open( .. in Python, instead of calling open and close manually. This is also the case when opening regular files.
  • Jonathan H
    Jonathan H about 7 years
    A second note; using arcname=os.path.basename(source_dir) still means that the archive contains a folder which contains the contents of source_dir. If you want the root of the archive to contain the contents themselves, and not contents within a folder, use arcname=os.path.sep instead.
  • The Godfather
    The Godfather about 5 years
    @Sheljohn unfortunately, this is not fully correct, because if one uses os.path.sep, then the archive will contain service "." or "/" folder which is not a problem usually, but sometimes it can be an issue if you later process this archive programmatically. It seems the only real clean way is to do os.walk and add files individually
  • edthrn
    edthrn over 4 years
    To get rid of all the directory structure, just use arcname='.'. No need to use os.walk.
  • THAVASI.T
    THAVASI.T over 3 years
    import tarfile package
  • thach.nv92
    thach.nv92 about 3 years
    @CNBorn I just want to compress to sample.gz. import tarfile tar = tarfile.open("sample.gz", "r:gz") for name in ["file1", "file2", "file3"]: tar.add(name) tar.close() It's Ok?
  • Alex Reinking
    Alex Reinking almost 3 years
    You should consider expanding this answer to include detail about what was wrong with the other answer and explain why this snippet works.
  • jrp
    jrp almost 3 years
    If I generate this tarfile on Linux, will this open successfully on other platforms say, Windows & Mac?
  • Max Truxa
    Max Truxa about 2 years
    The "perfect answer" is vulnerable to shell injections. Please read the security considerations from the docs. Never pass unescaped strings to subprocess.run, subprocess.call, etc. if shell=True. Use shlex.quote to escape (Unix shells only).
  • Yitzchak
    Yitzchak about 2 years
    Thanks @MaxTruxa for the important information..