Compress directory to tar.gz with Commons Compress
Solution 1
I haven't figured out what exactly was going wrong but a scouring of google caches I found a working example. Sorry for the tumbleweed!
public void CreateTarGZ()
throws FileNotFoundException, IOException
{
try {
System.out.println(new File(".").getAbsolutePath());
dirPath = "parent/childDirToCompress/";
tarGzPath = "archive.tar.gz";
fOut = new FileOutputStream(new File(tarGzPath));
bOut = new BufferedOutputStream(fOut);
gzOut = new GzipCompressorOutputStream(bOut);
tOut = new TarArchiveOutputStream(gzOut);
addFileToTarGz(tOut, dirPath, "");
} finally {
tOut.finish();
tOut.close();
gzOut.close();
bOut.close();
fOut.close();
}
}
private void addFileToTarGz(TarArchiveOutputStream tOut, String path, String base)
throws IOException
{
File f = new File(path);
System.out.println(f.exists());
String entryName = base + f.getName();
TarArchiveEntry tarEntry = new TarArchiveEntry(f, entryName);
tOut.putArchiveEntry(tarEntry);
if (f.isFile()) {
IOUtils.copy(new FileInputStream(f), tOut);
tOut.closeArchiveEntry();
} else {
tOut.closeArchiveEntry();
File[] children = f.listFiles();
if (children != null) {
for (File child : children) {
System.out.println(child.getName());
addFileToTarGz(tOut, child.getAbsolutePath(), entryName + "/");
}
}
}
}
Solution 2
I followed this solution and it worked until I was processing a larger set of files and it randomly crashes after processing 15000 - 16000 files. the following line is leaking file handlers:
IOUtils.copy(new FileInputStream(f), tOut);
and the code crashed with a "Too many open files" error at the OS level The following minor change fix the problem:
FileInputStream in = new FileInputStream(f);
IOUtils.copy(in, tOut);
in.close();
Solution 3
I ended up doing the following:
public URL createTarGzip() throws IOException {
Path inputDirectoryPath = ...
File outputFile = new File("/path/to/filename.tar.gz");
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {
tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
List<File> files = new ArrayList<>(FileUtils.listFiles(
inputDirectoryPath,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
));
for (int i = 0; i < files.size(); i++) {
File currentFile = files.get(i);
String relativeFilePath = new File(inputDirectoryPath.toUri()).toURI().relativize(
new File(currentFile.getAbsolutePath()).toURI()).getPath();
TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
tarEntry.setSize(currentFile.length());
tarArchiveOutputStream.putArchiveEntry(tarEntry);
tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
tarArchiveOutputStream.closeArchiveEntry();
}
tarArchiveOutputStream.close();
return outputFile.toURI().toURL();
}
}
This takes care of the some of the edge cases that come up in the other solutions.
Solution 4
I had to make some adjustments to @merrick solution to get it to work related to the path. Perhaps with the latest maven dependencies. The currently accepted solution didn't work for me.
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.filefilter.DirectoryFileFilter;
import org.apache.commons.io.filefilter.RegexFileFilter;
public class TAR {
public static void CreateTarGZ(String inputDirectoryPath, String outputPath) throws IOException {
File inputFile = new File(inputDirectoryPath);
File outputFile = new File(outputPath);
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {
tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
List<File> files = new ArrayList<>(FileUtils.listFiles(
inputFile,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
));
for (int i = 0; i < files.size(); i++) {
File currentFile = files.get(i);
String relativeFilePath = inputFile.toURI().relativize(
new File(currentFile.getAbsolutePath()).toURI()).getPath();
TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
tarEntry.setSize(currentFile.length());
tarArchiveOutputStream.putArchiveEntry(tarEntry);
tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
tarArchiveOutputStream.closeArchiveEntry();
}
tarArchiveOutputStream.close();
}
}
}
Maven
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.18</version>
</dependency>
Solution 5
Check below for Apache commons-compress
and File walker examples.
This example tar.gz
a directory.
public static void createTarGzipFolder(Path source) throws IOException {
if (!Files.isDirectory(source)) {
throw new IOException("Please provide a directory.");
}
// get folder name as zip file name
String tarFileName = source.getFileName().toString() + ".tar.gz";
try (OutputStream fOut = Files.newOutputStream(Paths.get(tarFileName));
BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {
Files.walkFileTree(source, new SimpleFileVisitor<>() {
@Override
public FileVisitResult visitFile(Path file,
BasicFileAttributes attributes) {
// only copy files, no symbolic links
if (attributes.isSymbolicLink()) {
return FileVisitResult.CONTINUE;
}
// get filename
Path targetFile = source.relativize(file);
try {
TarArchiveEntry tarEntry = new TarArchiveEntry(
file.toFile(), targetFile.toString());
tOut.putArchiveEntry(tarEntry);
Files.copy(file, tOut);
tOut.closeArchiveEntry();
System.out.printf("file : %s%n", file);
} catch (IOException e) {
System.err.printf("Unable to tar.gz : %s%n%s%n", file, e);
}
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException exc) {
System.err.printf("Unable to tar.gz : %s%n%s%n", file, exc);
return FileVisitResult.CONTINUE;
}
});
tOut.finish();
}
}
This example extracts a tar.gz
, and checks zip slip attack.
public static void decompressTarGzipFile(Path source, Path target)
throws IOException {
if (Files.notExists(source)) {
throw new IOException("File doesn't exists!");
}
try (InputStream fi = Files.newInputStream(source);
BufferedInputStream bi = new BufferedInputStream(fi);
GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) {
ArchiveEntry entry;
while ((entry = ti.getNextEntry()) != null) {
Path newPath = zipSlipProtect(entry, target);
if (entry.isDirectory()) {
Files.createDirectories(newPath);
} else {
// check parent folder again
Path parent = newPath.getParent();
if (parent != null) {
if (Files.notExists(parent)) {
Files.createDirectories(parent);
}
}
// copy TarArchiveInputStream to Path newPath
Files.copy(ti, newPath, StandardCopyOption.REPLACE_EXISTING);
}
}
}
}
private static Path zipSlipProtect(ArchiveEntry entry, Path targetDir)
throws IOException {
Path targetDirResolved = targetDir.resolve(entry.getName());
Path normalizePath = targetDirResolved.normalize();
if (!normalizePath.startsWith(targetDir)) {
throw new IOException("Bad entry: " + entry.getName());
}
return normalizePath;
}
References
Comments
-
awfulHack over 3 years
I'm running into a problem using the commons compress library to create a tar.gz of a directory. I have a directory structure that is as follows.
parent/ child/ file1.raw fileN.raw
I'm using the following code to do the compression. It runs fine without exceptions. However, when I try to decompress that tar.gz, I get a single file with the name "childDirToCompress". Its the correct size so the files have clearly been appended to each other in the tarring process. The desired output would be a directory. I can't figure out what I'm doing wrong. Can any wise commons compresser set me upon the correct path?
CreateTarGZ() throws CompressorException, FileNotFoundException, ArchiveException, IOException { File f = new File("parent"); File f2 = new File("parent/childDirToCompress"); File outFile = new File(f2.getAbsolutePath() + ".tar.gz"); if(!outFile.exists()){ outFile.createNewFile(); } FileOutputStream fos = new FileOutputStream(outFile); TarArchiveOutputStream taos = new TarArchiveOutputStream(new GZIPOutputStream(new BufferedOutputStream(fos))); taos.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_STAR); taos.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU); addFilesToCompression(taos, f2, "."); taos.close(); } private static void addFilesToCompression(TarArchiveOutputStream taos, File file, String dir) throws IOException{ taos.putArchiveEntry(new TarArchiveEntry(file, dir)); if (file.isFile()) { BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file)); IOUtils.copy(bis, taos); taos.closeArchiveEntry(); bis.close(); } else if(file.isDirectory()) { taos.closeArchiveEntry(); for (File childFile : file.listFiles()) { addFilesToCompression(taos, childFile, file.getName()); } } }
-
drigoangelo about 10 yearsFor future reference, youd don't need to close all concatenated streams. Closing the outermost stream (in this case,
tOut
) will do. -
krackoder over 7 yearsThis code is giving me an error
This archives contains unclosed entries
. Any idea what could be the reason for that? -
ldmtwo over 7 yearsCan you share the imports or lib names? I have these libraries (in Maven format), but FileUtils.listFiles() is not found with that erasure. Thanks. org.apache.commons:commons-collections4:4.1, org.apache.commons:commons-compress:1.12, commons-io:commons-io:2.4
-
ldmtwo over 7 yearsAnd the imports: import java.io.*; import java.net.URL; import java.nio.file.Path; import java.util.*; import org.apache.commons.compress.archivers.tar.*; import org.apache.commons.compress.compressors.gzip.*; import org.apache.commons.io.IOUtils; import org.apache.commons.io.filefilter.*;
-
Jire almost 7 yearsIf anybody is interested in a Kotlin solution that uses this fix and works perfectly, here you go: gist.github.com/Jire/8caa8603ef4871c79bef387f667a509f
-
stuart about 6 yearshad the same problem @ldmtwo, just change "inputDirectoryPath" to an actual file (and you can just make the reference a collection and not a new arraylist)
-
conteh over 5 years@awfulHack I'm also getting 'This archive contains unclosed entries'. Although not on directories that contain another directory before containing files. It's occurring on a directory that has files only and no subdirectories.
-
conteh over 5 years@awfulHack This worked for me. This should be the solution. The current solution gives an error in some cases.
-
Ethan Shepherd about 5 yearsIf you are getting the unclosed entries error when running this code, make sure to check your imports. I was missing one (IOUtils) and the finally block was obscuring the issue.
-
Logic almost 5 yearsI am getting
java.io.FileNotFoundException: XXX/XXX/XXX (Too many open files)