How to create full compressed tar file using Python?
Solution 1
To build a .tar.gz
(aka .tgz
) for an entire directory tree:
import tarfile
import os.path
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir
.
Solution 2
import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
tar.add(name)
tar.close()
If you want to create a tar.bz2 compressed file, just replace file extension name with ".tar.bz2" and "w:gz" with "w:bz2".
Solution 3
You call tarfile.open with mode='w:gz'
, meaning "Open for gzip compressed writing."
You'll probably want to end the filename (the name
argument to open
) with .tar.gz
, but that doesn't affect compression abilities.
BTW, you usually get better compression with a mode of 'w:bz2'
, just like tar
can usually compress even better with bzip2
than it can compress with gzip
.
Solution 4
Previous answers advise using the tarfile
Python module for creating a .tar.gz
file in Python. That's obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile
is approximately two times slower than the tar
utility in Linux. According to my experience this estimation is pretty correct.
So for faster archiving you can use the tar
command using subprocess
module:
subprocess.call(['tar', '-czf', output_filename, file_to_archive])
Solution 5
In addition to @Aleksandr Tukallo's answer, you could also obtain the output and error message (if occurs). Compressing a folder using tar
is explained pretty well on the following answer.
import traceback
import subprocess
try:
cmd = ['tar', 'czfj', output_filename, file_to_archive]
output = subprocess.check_output(cmd).decode("utf-8").strip()
print(output)
except Exception:
print(f"E: {traceback.format_exc()}")
shahjapan
Passion: Computer Science, Programming, R & D, Troubleshooting Programming: Python, C#, Shell scripting Frontend Technologies: Bootstrap, Javascript, JQuery, SCSS Cloud Technologies: GCP, AWS Virtualisation: Docker, Docker-Compose, k8s RDBMS: MySQL, PostgreSQL, SQLite Operating Systems: Linux, Windows, OSX Testing: Unit Tests, pytest, Jenkins, selenium Hobbies: Playing Chess, Swimming & roaming -- Japan Shah
Updated on November 09, 2021Comments
-
shahjapan over 2 years
How can I create a .tar.gz file with compression in Python?
-
Ignacio Vazquez-Abrams over 14 yearsJust a quick note that the filename for bzip2-compressed tarballs should end with ".tar.bz2".
-
Brōtsyorfuzthrāx over 8 yearsJust as a note to readers, if you leave out
arcname=os.path.basename(source_dir)
then it'll give you the entire path structure ofsource_dir
in the tar file (in most situations, that's probably inconvenient). -
Jonathan H about 7 yearsYou should really use
with tarfile.open( ..
in Python, instead of callingopen
andclose
manually. This is also the case when opening regular files. -
Jonathan H about 7 yearsA second note; using
arcname=os.path.basename(source_dir)
still means that the archive contains a folder which contains the contents ofsource_dir
. If you want the root of the archive to contain the contents themselves, and not contents within a folder, usearcname=os.path.sep
instead. -
The Godfather about 5 years@Sheljohn unfortunately, this is not fully correct, because if one uses
os.path.sep
, then the archive will contain service "." or "/" folder which is not a problem usually, but sometimes it can be an issue if you later process this archive programmatically. It seems the only real clean way is to doos.walk
and add files individually -
edthrn over 4 yearsTo get rid of all the directory structure, just use
arcname='.'
. No need to useos.walk
. -
THAVASI.T over 3 yearsimport tarfile package
-
thach.nv92 about 3 years@CNBorn I just want to compress to sample.gz. import tarfile tar = tarfile.open("sample.gz", "r:gz") for name in ["file1", "file2", "file3"]: tar.add(name) tar.close() It's Ok?
-
Alex Reinking almost 3 yearsYou should consider expanding this answer to include detail about what was wrong with the other answer and explain why this snippet works.
-
jrp almost 3 yearsIf I generate this tarfile on Linux, will this open successfully on other platforms say, Windows & Mac?
-
Max Truxa about 2 yearsThe "perfect answer" is vulnerable to shell injections. Please read the security considerations from the docs. Never pass unescaped strings to
subprocess.run
,subprocess.call
, etc. ifshell=True
. Useshlex.quote
to escape (Unix shells only). -
Yitzchak about 2 yearsThanks @MaxTruxa for the important information..