Find recursively all archive files of diverse archive formats and search them for file name patterns

10,055

Solution 1

(Adapted from How do I recursively grep through compressed archives?)

Install AVFS, a filesystem that provides transparent access inside archives. First run this command once to set up a view of your machine's filesystem in which you can access archives as if they were directories:

mountavfs

After this, if /path/to/archive.zip is a recognized archive, then ~/.avfs/path/to/archive.zip# is a directory that appears to contain the contents of the archive.

find ~/.avfs"$PWD" \( -name '*.7z' -o -name '*.zip' -o -name '*.tar.gz' -o -name '*.tgz' \) \
     -exec sh -c '
                  find "$0#" -name "*vacation*.jpg"
                 ' {} 'Test::Version' \;

Explanations:

  • Mount the AVFS filesystem.
  • Look for archive files in ~/.avfs$PWD, which is the AVFS view of the current directory.
  • For each archive, execute the specified shell snippet (with $0 = archive name and $1 = pattern to search).
  • $0# is the directory view of the archive $0.
  • {\} rather than {} is needed in case the outer find substitutes {} inside -exec ; arguments (some do it, some don't).

Or in zsh ≥4.3:

mountavfs
ls -l ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip)(e\''
     reply=($REPLY\#/**/*vacation*.jpg(.N))
'\')

Explanations:

  • ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip) matches archives in the AVFS view of the current directory and its subdirectories.
  • PATTERN(e\''CODE'\') applies CODE to each match of PATTERN. The name of the matched file is in $REPLY. Setting the reply array turns the match into a list of names.
  • $REPLY\# is the directory view of the archive.
  • $REPLY\#/**/*vacation*.jpg matches *vacation*.jpg files in the archive.
  • The N glob qualifier makes the pattern expand to an empty list if there is no match.

Solution 2

If you want something simpler that the AVFS solution, I wrote a Python script to do it called arkfind. You can actually just do

$ arkfind /path/to/search/ -g "*vacation*jpg"

It'll do this recursively, so you can look at archives inside archives to an arbitrary depth.

Solution 3

IMHO user-friendliness should be a thing in bash as well :

 while read -r zip_file ; do echo "$zip_file" ; unzip -l "$zip_file" | \
 grep -i --color=always -R "$to_srch"; \
 done < <(find . \( -name '*.7z' -o -name '*.zip' \)) | \
 less -R

and for tar ( this one is untested ... )

 while read -r tar_file ; do echo "$tar_file" ; tar -tf  "$tar_file" | \
 grep -i --color=always -R "$to_srch"; \
 done < <(find . \( -name '*.tar.gz' -o -name '*.tar' \)) | \
 less -R

Solution 4

Another solution that works is zgrep

zgrep -r filename *.zip

Solution 5

My usual solution:

find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|DESIRED_FILE_TO_SEARCH'

Example:

find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|characterize.txt'

Resuls are like:

foozip1.zip:
foozip2.zip:
foozip3.zip:
    DESIRED_FILE_TO_SEARCH
foozip4.zip:
...

If you want only the zip file with hits on it:

find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|FILENAME' | grep -B1 'FILENAME'

FILENAME here is used twice, so you can use a variable.

With find you might use PATH/TO/SEARCH

Share:
10,055

Related videos on Youtube

Tom
Author by

Tom

Updated on September 18, 2022

Comments

  • Tom
    Tom over 1 year

    At best I would like to have a call like this:

    $searchtool /path/to/search/ -contained-file-name "*vacation*jpg"
    

    ... so that this tool

    • does a recursive scan of the given path
    • takes all files with supported archive formats which should at least be the "most common" like zip, rar, 7z, tar.bz, tar.gz ...
    • and scan the file list of the archive for the name pattern in question (here *vacation*jpg)

    I'm aware of how to use the find tool, tar, unzip and alike. I could combine these with a shell script but I'm looking for a simple solution that might be a shell one-liner or a dedicated tool (hints to GUI tools are welcome but my solution must be command line based).

  • Chemik
    Chemik over 10 years
    It would be great if it supports jar files.
  • detly
    detly over 10 years
    @Chemik - noted! I'll do a bit more work on it this weekend :) JAR shouldn't be too hard, I believe it's really just a zip file to the outside world.
  • Chemik
    Chemik over 10 years
    Yes I see now, it works. You can add "JAR files" to README :)
  • Stéphane Chazelas
    Stéphane Chazelas over 7 years
    What implementation of zgrep is that? That doesn't work with the one shipped with GNU gzip (/bin/zgrep: -r: option not supported, zgrep (gzip) 1.6)
  • Yordan Georgiev
    Yordan Georgiev over 7 years
    yeah that is a bug ... corrected ... one should definitely use the correct binaries for the correct file types ... I just aimed to demonstrate the one-liner .. jee this one almost will get to the state of being ready as how-to receipt ...
  • golimar
    golimar almost 7 years
    It works for me when passing an archive as argument, but not a directory: IOError: [Errno 21] Is a directory: '.'
  • detly
    detly almost 7 years
    @golimar That's weird, I thought I tested it on directories. I'll look into it.