Grep / awk on multiple files to single output

shell-script text-processing awk grep

5,565

Solution 1

I'd iterate over the files in bash, keeping the file name of each, so you can redirect the output into different output files for each iteration.

For example like this (not tested):

PREFIX="/tmp/outputs"   # define where to store all the outputs
mkdir -p "${PREFIX}"    # make sure the outputs dir exists

for FILE in *.txt       # get the file names you want to work on
do
  # use ${PREFIX}/${FILE} to redirect output to a 
  # file that's associated with the input
  grep 'text' "${FILE}" | awk ' NR==1 {print $2 } ' > "${PREFIX}/${FILE}"
done

Solution 2

If I understand correctly, you want to do the following for each .txt file:

Locate the first line containing the pattern text.
On this line, take the second whitespace-separated field and write it out to a file whose name is related to the input file.

You aren't saying how the output file name should be constructed. I'll make it the same as the input file, but ending in .out instead of .txt.

You can do this with a shell loop.

for x in *.txt; do
  grep 'text' -- "$x" | awk '{print $2; exit}' >"${x%.*}.out"
done

Exiting awk as soon as it's done its job is slightly faster than telling it to keep reading but do nothing. Another possibility is to skip awk altogether and have the shell do the line splitting (whether this is faster or slower depends on so many factors that I won't hazard predictions):

for x in *.txt; do
  grep 'text' -- "$x" | read -r first second rest && printf '%s\n' "$rest" >"${x%.*}.out"
done

A different approach would be to do all the work in awk. Awk can act on multiple files and you can use awk's redirection for the output. This requires forking fewer processes. It's pretty straightforward in Gawk (GNU awk):

awk '/text/ {print $2 >substr(FILENAME, 1, length(FILENAME)-4) ".out"; nextfile}' *.txt

In an awk implementation that doesn't have nextfile, you need to manually handle transitions to the next file, which makes this approach less attractive (both more complex and less efficient).

awk '
    FNR==1 {first=1}
   first && /text/ {print $2 >substr(FILENAME, 1, length(FILENAME)-4) ".out"; first=0}' *.txt

5,565

Knut

Updated on September 18, 2022

Comments

Knut over 1 year
I have several txt-files containing data, where i use grep to search for a current string of text, and use awk to filter out the variable i need. The string is repeated through the file, so i currently use this command to extract the desired string:
```
grep 'text' *.txt | awk ' NR==1  {print $2 } ' > outputfile
```
The problem is that i want to cycle through multiple files in the folder, and for each file get the extracted variable written into a single output-file. I know the question have been answered before, but I am quite fresh to this and have some difficulties implementing.

Any feedback would be highly appreciated !
- heemayl over 8 years
  
  You are selecting all files ending in .txt in that folder..as it stands it should work for if you are interested in just .txt files..If not, perhaps you should give a complete example..
- G-Man Says 'Reinstate Monica' over 8 years
  
  I presume that you understand your problem perfectly, but it’s not clear to us. Can you please provide some examples of what your input files look like and what you want to get as output? You say, “I want to cycle through … files … and for each file … [write] into a single output-file.” As @heemayl says, you seem to be pretty close to that now — unless you want a single output file per input file. What other answers have you looked at; what have you tried; and how have they fallen short of the result you want? Please do not respond in comments; edit your question to make it clearer.
123 over 8 years

you don't need $(ls *.txt),using *.txt is sufficient. Also quote yours variables.
replay over 8 years

@123 thank you, and congrats for the username
dave_thompson_085 over 8 years

That awk method prints from the first line of each file, not the first matching line. Without gawk you need FNR==1 {found=0} !found&&/text/ {print $2 >etc; found=1} .