Grep / awk on multiple files to single output
Solution 1
I'd iterate over the files in bash, keeping the file name of each, so you can redirect the output into different output files for each iteration.
For example like this (not tested):
PREFIX="/tmp/outputs" # define where to store all the outputs
mkdir -p "${PREFIX}" # make sure the outputs dir exists
for FILE in *.txt # get the file names you want to work on
do
# use ${PREFIX}/${FILE} to redirect output to a
# file that's associated with the input
grep 'text' "${FILE}" | awk ' NR==1 {print $2 } ' > "${PREFIX}/${FILE}"
done
Solution 2
If I understand correctly, you want to do the following for each .txt
file:
- Locate the first line containing the pattern
text
. - On this line, take the second whitespace-separated field and write it out to a file whose name is related to the input file.
You aren't saying how the output file name should be constructed. I'll make it the same as the input file, but ending in .out
instead of .txt
.
You can do this with a shell loop.
for x in *.txt; do
grep 'text' -- "$x" | awk '{print $2; exit}' >"${x%.*}.out"
done
Exiting awk as soon as it's done its job is slightly faster than telling it to keep reading but do nothing. Another possibility is to skip awk altogether and have the shell do the line splitting (whether this is faster or slower depends on so many factors that I won't hazard predictions):
for x in *.txt; do
grep 'text' -- "$x" | read -r first second rest && printf '%s\n' "$rest" >"${x%.*}.out"
done
A different approach would be to do all the work in awk. Awk can act on multiple files and you can use awk's redirection for the output. This requires forking fewer processes. It's pretty straightforward in Gawk (GNU awk):
awk '/text/ {print $2 >substr(FILENAME, 1, length(FILENAME)-4) ".out"; nextfile}' *.txt
In an awk implementation that doesn't have nextfile
, you need to manually handle transitions to the next file, which makes this approach less attractive (both more complex and less efficient).
awk '
FNR==1 {first=1}
first && /text/ {print $2 >substr(FILENAME, 1, length(FILENAME)-4) ".out"; first=0}' *.txt
Related videos on Youtube
Knut
Updated on September 18, 2022Comments
-
Knut over 1 year
I have several txt-files containing data, where i use
grep
to search for a current string of text, and useawk
to filter out the variable i need. The string is repeated through the file, so i currently use this command to extract the desired string:grep 'text' *.txt | awk ' NR==1 {print $2 } ' > outputfile
The problem is that i want to cycle through multiple files in the folder, and for each file get the extracted variable written into a single output-file. I know the question have been answered before, but I am quite fresh to this and have some difficulties implementing.
Any feedback would be highly appreciated !
-
heemayl over 8 yearsYou are selecting all files ending in
.txt
in that folder..as it stands it should work for if you are interested in just.txt
files..If not, perhaps you should give a complete example.. -
G-Man Says 'Reinstate Monica' over 8 yearsI presume that you understand your problem perfectly, but it’s not clear to us. Can you please provide some examples of what your input files look like and what you want to get as output? You say, “I want to cycle through … files … and for each file … [write] into a single output-file.” As @heemayl says, you seem to be pretty close to that now — unless you want a single output file per input file. What other answers have you looked at; what have you tried; and how have they fallen short of the result you want? Please do not respond in comments; edit your question to make it clearer.
-
-
123 over 8 yearsyou don't need
$(ls *.txt)
,using*.txt
is sufficient. Also quote yours variables. -
replay over 8 years@123 thank you, and congrats for the username
-
dave_thompson_085 over 8 yearsThat
awk
method prints from the first line of each file, not the first matching line. Withoutgawk
you needFNR==1 {found=0} !found&&/text/ {print $2 >etc; found=1}
.