How to grep for a pattern in the files in tar archive without filling up disk space
Solution 1
Here's my take on this:
while read filename; do tar -xOf file.tar "$filename" | grep 'pattern' | sed "s|^|$filename:|"; done < <(tar -tf file.tar | grep -v '/$')
Broken out for explanation:
-
while read filename; do
-- it's a loop... -
tar -xOf file.tar "$filename"
-- this extracts each file... -
| grep 'pattern'
-- here's where you put your pattern... -
| sed "s|^|$filename:|";
- prepend the filename, so this looks like grep. Salt to taste. -
done < <(tar -tf file.tar | grep -v '/$')
-- end the loop, get the list of files as to fead to yourwhile read
.
One proviso: this breaks if you have OR bars (|
) in your filenames.
Hmm. In fact, this makes a nice little bash function, which you can append to your .bashrc
file:
targrep() {
local taropt=""
if [[ ! -f "$2" ]]; then
echo "Usage: targrep pattern file ..."
fi
while [[ -n "$2" ]]; do
if [[ ! -f "$2" ]]; then
echo "targrep: $2: No such file" >&2
fi
case "$2" in
*.tar.gz) taropt="-z" ;;
*) taropt="" ;;
esac
while read filename; do
tar $taropt -xOf "$2" \
| grep "$1" \
| sed "s|^|$filename:|";
done < <(tar $taropt -tf $2 | grep -v '/$')
shift
done
}
Solution 2
Seems like nobody posted this simple solution that processes the archive only once:
tar xzf archive.tgz --to-command \
'grep --label="$TAR_FILENAME" -H PATTERN ; true'
Here tar
passes the name of each file in a variable (see the docs) and it is used by grep
to print it with each match. Also true
is added so that tar
doesn't complain about failing to extract files that don't match.
Solution 3
Here's a bash function that may work for you. Add the following to your ~/.bashrc
targrep () {
for i in $(tar -tzf "$1"); do
results=$(tar -Oxzf "$1" "$i" | grep --label="$i" -H "$2")
echo "$results"
done
}
Usage:
targrep archive.tar.gz "pattern"
Solution 4
It's incredibly hacky, but you could abuse tar's -v
option to process and delete each file as it is extracted.
grep_and_delete() {
if [ -n "$1" -a -f "$1" ]; then
grep -H 'this' -- "$1" </dev/null
rm -f -- "$1" </dev/null
fi
}
mkdir tmp; cd tmp
tar -xvzf test.tar.gz | (
prev=''
while read pathname; do
grep_and_delete "$prev"
prev="$pathname"
done
grep_and_delete "$prev"
)
Solution 5
tar -tf test.tar.gz | grep -v '/$'| \
xargs -n 1 -I _ \
sh -c 'tar -xOf test.tar.gz _|grep -q <YOUR SEARCH PATTERN> && echo _'
Ankur Agarwal
Updated on June 05, 2022Comments
-
Ankur Agarwal almost 2 years
I have a tar archive which is very big ~ 5GB.
I want to grep for a pattern on all files (and also print the name of the file that has the pattern ) in the archive but do not want to fill up my disk space by extracting the archive.
Anyway I can do that?
I tried these, but this does not give me the file names that contain the pattern, just the matching lines:
tar -O -xf test.tar.gz | grep 'this' tar -xf test.tar.gz --to-command='grep awesome'
Also where is this feature of tar documented? tar xf test.tar $FILE