Unix script to search within a compressed .gz file
The essence of how to accomplish this is to get the names of the files within the tarball to search over, and extract their content to be searched, while not extracting anything else. Because we don't want to write to the file system, we can use the -O
flag to instead extract to standard-out.
tar -tzf file.tar.gz | grep '\.txt' | xargs tar -Oxzf file.tar.gz | grep -B 3 "string-or-regex"
will concatenate all of the files in the .tar.gz with names ending in ".txt", and grep
them for the given string, also outputting the 3 previous lines. It won't tell you which file in the tarball any match came from, and the "three previous lines" may in fact come from the previous file.
You can instead do:
for file in $(tar -tzf file.tar.gz | grep '\.txt'); do
tar -Oxzf file.tar.gz "$file" | grep -B 3 --label="$file" -H "string-or-regex"
done
which will respect file boundaries, and report the file names, but be much less efficient.
(-z
tells tar
it is gzip
compressed. -t
lists the contents. -x
extracts. -O
redirects to standard output rather than the file system. Older tar
s may not have the -O
or -z
flag, and will want the flags without -
: e.g. tar tz file.tar.gz
)
Okay, so you have an unusable grep. We can fix that with awk!
#!/usr/bin/awk -f
BEGIN { context=3; }
{ add_buffer($0) }
/pattern/ { print_buffer() }
function add_buffer(line)
{
buffer[NR % context]=line
}
function print_buffer()
{
for(i = max(1, NR-context+1); i <= NR; i++) {
print buffer[i % context]
}
}
function max(a,b)
{
if (a > b) { return a } else { return b }
}
This will not coalesce adjacent matches, unlike grep -B, and can thus repeat lines that are within 3 lines of two different matches.
Comments
-
CFUser almost 2 years
I want to get a few lines from a file which is in a compressed .gz file.
The .gz file contains many txt files and I want to search a string in all these txt files and need to get the previous 3 line as output, including the current line (where the search string is present).
I tried
zgrep
and got the line number, but when I usehead
ortail
command It's giving some garbage values. I think we cannot use thehead
ortail
commands with compressed files containing multiple files.Please suggest if there is any simple way?
-
CFUser over 13 yearsyes its gzip of a tar file. I cannot Extract, bcoz it contains Huge files and will get Disk space problems
-
wnoise over 13 yearsDoes it support -C? Is it a problem to get 3 lines after as well?
-
CFUser over 13 yearsunfortunately no C as well :(
-
SourceSeeker over 13 years@CFUser: Without
-B
support ingrep
, you'll have to usedawk
,sed
or Perl to hold a moving window of lines which are output when your match is found. GNUtar
supports--wildcards
which makes the firsttar|grep
in each of the versions unnecessary. Other versions oftar
may or may not support globbing and may or may not require a switch to enable it. -
Conrad Meyer over 13 yearsAs long as you want GNU tar, Why not just install GNU coreutils and use gtar/ggrep? But in general, I like the awk answer =).