Find recursively all archive files of diverse archive formats and search them for file name patterns
Solution 1
(Adapted from How do I recursively grep through compressed archives?)
Install AVFS, a filesystem that provides transparent access inside archives. First run this command once to set up a view of your machine's filesystem in which you can access archives as if they were directories:
mountavfs
After this, if /path/to/archive.zip
is a recognized archive, then ~/.avfs/path/to/archive.zip#
is a directory that appears to contain the contents of the archive.
find ~/.avfs"$PWD" \( -name '*.7z' -o -name '*.zip' -o -name '*.tar.gz' -o -name '*.tgz' \) \
-exec sh -c '
find "$0#" -name "*vacation*.jpg"
' {} 'Test::Version' \;
Explanations:
- Mount the AVFS filesystem.
- Look for archive files in
~/.avfs$PWD
, which is the AVFS view of the current directory. - For each archive, execute the specified shell snippet (with
$0
= archive name and$1
= pattern to search). -
$0#
is the directory view of the archive$0
. -
{\}
rather than{}
is needed in case the outerfind
substitutes{}
inside-exec ;
arguments (some do it, some don't).
Or in zsh ≥4.3:
mountavfs
ls -l ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip)(e\''
reply=($REPLY\#/**/*vacation*.jpg(.N))
'\')
Explanations:
-
~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip)
matches archives in the AVFS view of the current directory and its subdirectories. -
PATTERN(e\''CODE'\')
applies CODE to each match of PATTERN. The name of the matched file is in$REPLY
. Setting thereply
array turns the match into a list of names. -
$REPLY\#
is the directory view of the archive. -
$REPLY\#/**/*vacation*.jpg
matches*vacation*.jpg
files in the archive. - The
N
glob qualifier makes the pattern expand to an empty list if there is no match.
Solution 2
If you want something simpler that the AVFS solution, I wrote a Python script to do it called arkfind. You can actually just do
$ arkfind /path/to/search/ -g "*vacation*jpg"
It'll do this recursively, so you can look at archives inside archives to an arbitrary depth.
Solution 3
IMHO user-friendliness should be a thing in bash as well :
while read -r zip_file ; do echo "$zip_file" ; unzip -l "$zip_file" | \
grep -i --color=always -R "$to_srch"; \
done < <(find . \( -name '*.7z' -o -name '*.zip' \)) | \
less -R
and for tar ( this one is untested ... )
while read -r tar_file ; do echo "$tar_file" ; tar -tf "$tar_file" | \
grep -i --color=always -R "$to_srch"; \
done < <(find . \( -name '*.tar.gz' -o -name '*.tar' \)) | \
less -R
Solution 4
Another solution that works is zgrep
zgrep -r filename *.zip
Solution 5
My usual solution:
find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|DESIRED_FILE_TO_SEARCH'
Example:
find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|characterize.txt'
Resuls are like:
foozip1.zip:
foozip2.zip:
foozip3.zip:
DESIRED_FILE_TO_SEARCH
foozip4.zip:
...
If you want only the zip file with hits on it:
find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|FILENAME' | grep -B1 'FILENAME'
FILENAME here is used twice, so you can use a variable.
With find you might use PATH/TO/SEARCH
Related videos on Youtube
Tom
Updated on September 18, 2022Comments
-
Tom over 1 year
At best I would like to have a call like this:
$searchtool /path/to/search/ -contained-file-name "*vacation*jpg"
... so that this tool
- does a recursive scan of the given path
- takes all files with supported archive formats which should at least be the "most common" like zip, rar, 7z, tar.bz, tar.gz ...
- and scan the file list of the archive for the name pattern in question (here
*vacation*jpg
)
I'm aware of how to use the find tool, tar, unzip and alike. I could combine these with a shell script but I'm looking for a simple solution that might be a shell one-liner or a dedicated tool (hints to GUI tools are welcome but my solution must be command line based).
-
Chemik over 10 yearsIt would be great if it supports jar files.
-
detly over 10 years@Chemik - noted! I'll do a bit more work on it this weekend :) JAR shouldn't be too hard, I believe it's really just a zip file to the outside world.
-
Chemik over 10 yearsYes I see now, it works. You can add "JAR files" to README :)
-
Stéphane Chazelas over 7 yearsWhat implementation of
zgrep
is that? That doesn't work with the one shipped with GNUgzip
(/bin/zgrep: -r: option not supported
,zgrep (gzip) 1.6
) -
Yordan Georgiev over 7 yearsyeah that is a bug ... corrected ... one should definitely use the correct binaries for the correct file types ... I just aimed to demonstrate the one-liner .. jee this one almost will get to the state of being ready as how-to receipt ...
-
golimar almost 7 yearsIt works for me when passing an archive as argument, but not a directory:
IOError: [Errno 21] Is a directory: '.'
-
detly almost 7 years@golimar That's weird, I thought I tested it on directories. I'll look into it.