Why would I tar a single file?
Solution 1
Advantages of using .tar.gz
instead of .gz
are that
-
tar
stores more meta-data (UNIX permissions etc.) thangzip
. - the setup can more easily be expanded to store multiple files
- .tar.gz files are very common, only-gzipped files may puzzle some users. (cf. MelBurslans comment)
The overhead of using tar
is also very small.
If not really needed, I still do not recommend to tar a single file.
There are many useful tools which can access compressed single files directly (such as zcat
, zgrep
etc. - also existing for bzip2
and xz
).
Solution 2
You are actually asking only half of the question. The other question being, "Why would I compress a tar file with gzip?". And the answer is not just that gzip
makes the file smaller (in most cases):
tar
:
- stores filename and other metadata: mode, owner ID, group ID, filesize, modification time
- stores a checksum (for the header only)
gzip
:
- can store the original filename, but that is optional
- has a CRC-32 checksum over the original data
- it compresses the file
With only tar
you cannot be sure your data was not corrupted. With only gzip
you cannot restore user/group ID, modification time and possible not the original filename.
The combination is more powerful than the individual commmands/formats provides, because they complement each other's features.
Solution 3
There is a quite big advantage to using only-gzipped text files - the contents can be directly accessed with command-line tools like less
, zgrep
, zcat
.
Solution 4
I would say it's likely that the people just don't realise they can use gzip/bzip2/xz without tar. Possibly because they come from a DOS/Windows background where it is normal for compression and archiving to be integrated in a single format (ZIP, RAR, etc).
While there may be slight advantages to using tar in some situations due to the storage of metadata or the ability to add extra files, there are also disadvantages. With a plain gzip/bzip2/xz file you can decompress it and pipe the decompressed data straight to another tool (such as your database) without ever having to store the decompressed data as a file on disk. With a tarball this is harder.
Solution 5
There is an important difference that could make using tar
important under some circumstances: Besides the "metadata" that @jofel mentioned in his answer, tar
records the filename in the archive. When you extract it, you get the original filename regardless of what the archive is called.
In your case the tar archive and the file it contains have the related names db.dump.tar.gz
and db.tar
, but suppose you rename the tar file to 20-Apr-16.dump.tgz
, or whatever. Untar this with tar xvfz
, and you get db.dump
. For comparison, unzip 20-Apr-16.dump.gz
and you've got 20-Apr-16.dump
. (Edit: as pointed out in the comments, gzip also makes a record of the filename; but it's not normally used when unzipping). A tar
archive can also contain a relative pathname that puts the extracted file in a subdirectory.
Your use case will dictate whether this kind of filename persistence is needed, or even wanted, or is actually undesirable. But certainly, regardless of compression, a tar
archive travels differently from a regular file.
Related videos on Youtube
gardenhead
I like working on interesting problems, and have an interest in many areas of programming: systems, servers, programming languages, networks, parallel computers. In general I love learning and am always looking to grow. I have a preference toward functional or statically-typed languages (preferably both).
Updated on September 18, 2022Comments
-
gardenhead over 1 year
At my company, we download a local development database snapshot as a
db.dump.tar.gz
file. The compression makes sense, but the tarball only contains a single file (db.dump
).Is there any point to archiving a single file, or is
.tar.gz
just such a common idiom? Why not just.gz
?-
plugwash about 8 yearsAll tarring a single file will do is add a few metadata blocks to the start and end of the file. The actual file data passes through tar to the compressor untouched. So for a large file the size difference between plain compression and taring will be negligable.
-
Pharap about 8 yearsIn the past when trying various compression methods I found
.tar.gz
to be superior to most other common methods. I recall it was superior to just.tar
but cannot remember if it was better than just.gz
. Ironically Window's.cab
format was the best of the methods I tried, which was very unexpected. -
gardenhead about 8 years@Pharap
tar
is not a compression algorithm, it's an archiving format -
Pharap about 8 years@gardenhead Well that would explain why it didn't work very well.
-
-
gardenhead about 8 yearsI didn't consider the meta-data aspect. Very good point
-
bgStack15 about 8 yearsIf I see a
.gz
, my first instinct is totar -zxf foo.gz
. Remembering that gzip is even a command takes a few more seconds. -
Brandon about 8 years@bgStack15 FWIW you don't need the
z
(or the-
for that matter), most moderntar
s will automatically detect the file needs to be decompressed. -
hyde about 8 yearsWith GNU tar,it takes just -O switch to output to stdout, so I wouldn't say it is much harder!
-
underscore_d about 8 yearsinteresting point, but the question is about a database snapshot, unlikely to be a text file, and not only-gzipped.
-
underscore_d about 8 yearsThe first paragraph seems plausible enough for files using the
tgz
extension. However, the OP's case usestar.gz
- and if these hypothetical ex-Win/DOS users are anything like I was, the first thing they say when looking at such a file is: 'Why does it have 2 extensions?'. Then they google it and quickly get the answer, which specifically explains thattar
and compression are distinct. ;-) -
psusi about 8 yearsgzip also records the original filename.
-
Miles about 8 yearsYup. The name is optional in the gzip header—obviously there won't be one if you compressed the streaming output of a command—and most tools won't restore it by default (for instance, you have to use
gzip --name
explicitly when decompressing), but you don't have to use tar to get filename persistence. -
gardenhead about 8 yearsThanks for clarifying that! When I was reading the
tar
wikipedia page, I misunderstood the description to mean that the checksum was for the whole file. -
alexis about 8 yearsThanks for pointing this out, I hadn't known that. Still, since that's not the default behavior, the point stands: Distributing a file in tar format preserves the original filename (and possibly the relative path), without intervention of the recipient. Distributing a (g)zipped file doesn't.
-
Ross Ridge about 8 yearsBy default
gzip
will store the original file name and time stamp. You can use the-N
option when decompressing to restore them. -
YoloTats.com about 8 years@RossRidge thanks, I removed again the text about the original file name.
-
Dewi Morgan about 8 yearsThis feels to me like the correct answer. I'd also add a few more reasons, which you might wanto to edit in if you agree. 1) there's no additional cost to the admin for .tgz over .tar or .gz alone: they're all just one command 2) Admins back up, copy, relocate, move a LOT of files, for a lot of different reasons; DB backups are just one of these. They can use the same workflow, tools and commands whether backing up one or multiple files; so why specialcase using the syntax of the gzip command, for the case where there is one file?
-
CodesInChaos about 8 yearsWhat is the advantage of these tools over simply piping the output of a decompressor into the plain tools?