unzip specific extension only

13,594

Solution 1

Something along the lines of:

#!/bin/bash
cd ~/basedir/files
for file in *.zip ; do
    newfile=$(echo "${file}" | sed -e 's/^files.//' -e 's/.zip$//')
    echo ":${newfile}:"
    mkdir tmp
    rm -rf "${newfile}"
    mkdir "${newfile}"
    cp "${newfile}.zip" tmp
    cd tmp
    unzip "${newfile}.zip"
    find . -name '*.jpg' -exec cp {} "../${newfile}" ';'
    find . -name '*.gif' -exec cp {} "../${newfile}" ';'
    cd ..
    rm -rf tmp
done

This is tested and will handle spaces in filenames (both the zip files and the extracted files). You may have collisions if the zip file has the same file name in different directories (you can't avoid this if you're going to flatten the directory structure).

Solution 2

In Short

You can do this with a one-liner find + unzip.

find . -name "*.zip" -type f -exec unzip -jd "images/{}" "{}" "*.jpg" "*.png" "*.gif" \;

In Detail

unzip allows you to specify the files you want:

unzip archive.zip "*.jpg" "*.png" "*.gif"

And -d a target directory:

unzip -d images/ archive.zip "*.jpg" "*.png" "*.gif"

Combine that with a find, and you can extract all the images in all zips:

find . -name "*.zip" -type f -exec unzip -d images/ {} "*.jpg" "*.png" "*.gif" \;

Using unzip -j to junk the extraction of the zip's internal directory structure, we can do it all in one command. This gives you the flat image list separated by zip name that you desire as a one-liner.

find . -name "*.zip" -type f -exec unzip -jd "images/{}" "{}" "*.jpg" "*.png" "*.gif" \;

A limitation is that unzip -d won't create more than one new level of directories, so just mkdir images first. Enjoy.

Solution 3

7zip can do this, and has a Linux version.

mkdir files/archive1
7z e -ofiles/archive1/ files/archive1.zip *.jpg *.png *.gif

(Just tested it, it works.)

Solution 4

You can write a program using a zip library. If you do Mono, you can use DotNetZip.

The code would look like this:

foreach (var archive in listOfZips)
{
    using (var zip = ZipFile.Read(archive)
    {
        foreach (ZipEntry e in zip)
        {
            if (IsImageFile(e.FileName))
            {
                e.FileName = System.IO.Path.Combine(archive.Replace(".zip",""), 
                                  System.IO.Path.GetFileName(e.FileName));
                e.Extract("files");
            }
        }
    }
}

Solution 5

Perl's Archive-Zip is a good library for zipping/unzipping.

Share:
13,594
vache
Author by

vache

Updated on July 27, 2022

Comments

  • vache
    vache almost 2 years

    I have a a directory with zip archives containing .jpg, .png, .gif images. I want to unzip each archive taking the images only and putting them in a folder with the name of the archive.

    So:

    files/archive1.zip
    files/archive2.zip
    files/archive3.zip
    files/archive4.zip
    

    Open archive1.zip - take sunflower.jpg, rose_sun.gif. Make a folder files/archive1/ and add the images to that folder, so files/archive1/folder1.jpg, files/archive1/rose_sun.gif. Do this to each archive.

    I really don't know how this can be done, all suggestions are welcome. I have over 600 archives and an automatic solution would be a lifesaver, preferably a linux solution.

  • MiffTheFox
    MiffTheFox almost 15 years
    This would be an excellent solution, except that the temp directory ends up wasting IO and system resources. You should add wildcards to the unzip call. (Add '.jpg' '.png' '*.gif' to the end.) Also, you should avoid copying the zip file, and instead use "unzip ../${newfile}/zip".
  • paxdiablo
    paxdiablo almost 15 years
    I don't believe that level of efficiency is a real concern here, this looks like a one-shot operation to me (or one that wouldn't be done often enough to warrant over-engineering). The end result is what the OP wanted, the graphic files in a specific directory based on the archive name.
  • vache
    vache almost 15 years
    well i want to keep it on linux, so preferably not .net, but than i should be able to do this same thing lets say using a java zip library no?
  • vache
    vache almost 15 years
    but than i would have to run this for each zip can i add a while loops, or something?
  • vache
    vache almost 15 years
    yeah, this would be a one time thing :), i guess i am testing all these solutions right now, locally, than will try it on a larger amount of zips on the server
  • vache
    vache almost 15 years
    ok so the big problem with this is, that the archives have names that have spaces in them, and the code above creates bunch of folders, with text in the zip name being separated by spaces
  • vache
    vache almost 15 years
    simply this one too does something similar for file in zip/*.zip ; do newfile=$(echo ${file}) unzip ${newfile} '.jpg' '.png' '*.gif' done
  • vache
    vache almost 15 years
    but it too breaks when there are spaces in the zip name
  • paxdiablo
    paxdiablo almost 15 years
    It's been fixed to handle spaces now in both the zip files and the zipped files within them.
  • vache
    vache almost 15 years
    this will work since the zips all have unique names, and they are all in one folder
  • vache
    vache almost 15 years
    adding the '.jpg' '.png' '*.gif' to the end of the unzip call speeds this up very much, and if you add that, than there is no need for the extension in the copy line no?
  • paxdiablo
    paxdiablo almost 15 years
    That's right but you will need a "-type f" on the find to only get files rather than directories (I should have done that already in case you had a directory called dir.jpg). Replace the finds with a single "find . -type f -exec cp {} "../${newfile}" ';'"
  • Cheeso
    Cheeso almost 15 years
    Sorry, I don't know if the Java zip libraries have similar function. I mean, I'm sure you could do it, it's a simple matter of programming. But the question is how much programming. When yous ay "I want to keep it on Linux, so preferably not .NET" - are you aware that Mono runs on Linux? In other words, you can use C# and .NET on Linux.