Pass shell variable as a /pattern/ to awk

110,796

Solution 1

Use awk's ~ operator, and you don't need to provide a literal regex on the right-hand side:

function _process () {
    awk -v l="$line" -v pattern="$1" '
        $0 ~ pattern {p=1} 
        END {if(p) print l >> "outfile.txt"}
    '  
}

Although this would be more efficient (don't have to read the whole file)

function _process () {
    grep -q "$1" && echo "$line"
}

Depending on the pattern, may want grep -Eq "$1"

Solution 2

awk  -v pattern="$1" '$0 ~ pattern'

Has an issue in that awk expands the ANSI C escape sequences (like \n for newline, \f for form feed, \\ for backslash and so on) in $1. So it becomes an issue if $1 contains backslash characters which is common in regular expressions (with GNU awk 4.2 or above, values that start with @/ and end in /, are also a problem). Another approach that doesn't suffer from that issue is to write it:

PATTERN=$1 awk '$0 ~ ENVIRON["PATTERN"]'

How bad it's going to be will depend on the awk implementation.

$ nawk -v 'a=\.' 'BEGIN {print a}'
.
$ mawk -v 'a=\.' 'BEGIN {print a}'
\.
$ gawk -v 'a=\.' 'BEGIN {print a}'
gawk: warning: escape sequence `\.' treated as plain `.'
.
$ gawk5.0.1 -v 'a=@/foo/' BEGIN {print a}'
foo

All awks work the same for valid escape sequences though:

$ a='\\-\b' awk 'BEGIN {print ENVIRON["a"]}' | od -tc
0000000   \   \   -   \   b  \n
0000006

(content of $a passed as-is)

$ awk -v a='\\-\b' 'BEGIN {print a}' | od -tc
0000000   \   -  \b  \n
0000004

(\\ changed to \ and \b changed to a backspace character).

Solution 3

Try something like:

awk -v l="$line" -v search="$pattern" 'BEGIN {p=0}; { if ( match( $0, search )) {p=1}}; END{ if(p) print l >> "outfile.txt" }'
Share:
110,796

Related videos on Youtube

branquito
Author by

branquito

si la carta se jugara y te da de nuevo.. que yo no me casaba que me quedaba soltero.. pa toda la vida.. tu con tu madre y yo con la mia.

Updated on September 18, 2022

Comments

  • branquito
    branquito over 1 year

    Having the following in one of my shell functions:

    function _process () {
      awk -v l="$line" '
      BEGIN {p=0}
      /'"$1"'/ {p=1}
      END{ if(p) print l >> "outfile.txt" }
      '
    }
    

    , so when called as _process $arg, $arg gets passed as $1, and used as a search pattern. It works this way, because shell expands $1 in place of awk pattern! Also l can be used inside awk program, being declared with -v l="$line". All fine.

    Is it possible in same manner give pattern to search as a variable?

    Following will not work,

    awk -v l="$line" -v search="$pattern" '
      BEGIN {p=0}
      /search/ {p=1}
      END{ if(p) print l >> "outfile.txt" }
      '
    

    ,as awk will not interpret /search/ as a variable, but instead literally.

    • Ed Morton
      Ed Morton almost 3 years
      What you're searching for is not text that matches a "pattern", it's text that matches either a string or a regular expression. See how-do-i-find-the-text-that-matches-a-pattern for why that matters and why you shouldn't use the word "pattern" in this context.
    • Ed Morton
      Ed Morton almost 3 years
      See also how-do-i-use-shell-variables-in-an-awk-script for a comprehensive answer to the question of how to pass the value of shell variables or other values to an awk script.
  • branquito
    branquito about 10 years
    Is this way safe if $pattern contains spaces, my example from above will work as $1 is protected with "$1" double quotes, however not shure what happens in your case.
  • skunkwerx
    skunkwerx about 10 years
    The quick tests I ran seemed to work the same, but I won't even begin to guarantee it... :)
  • Kilian Foth
    Kilian Foth about 10 years
    Your original example ends the single-quoted string at the second ', then protects the $1 via double quotes and then tacks another single-quoted string for the second half of the awk program. If I understand correctly, this should have exactly the same effect as protecting the $1 via the outer single quotes - awk never sees the double quotes that you put around it.
  • branquito
    branquito about 10 years
    This is exactly what solves this in a way I wanted (1st example), because it keeps the semantics, which was my goal. Thanks.
  • branquito
    branquito about 10 years
    yes I noticed, it needs to be set on BEGIN block to zero each time, as it serves as a switch. But interestingly I tried now script using $0 ~ pattern, and it does not work, however with /'"$1"'/ it does work!? :O
  • branquito
    branquito about 10 years
    maybe it has something to do with the way $line is retrieved, pattern search is done on the output of whois $line, $line coming from file in a WHILE DO block.
  • Angel Todorov
    Angel Todorov about 10 years
    Please show the contents of $line -- do it in your question for proper formatting.
  • Angel Todorov
    Angel Todorov about 10 years
    Don't write /$0 ~ search/ -- leave out the slashes: $0 ~ search
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    But if $pattern contains ^/ {system("rm -rf /")};, then you're in big trouble.
  • branquito
    branquito about 10 years
    So you are saying that if pattern was for example \d{3} to find three digits, that wouldn't work as expected, if I understood you well?
  • branquito
    branquito about 10 years
    is that downside of this approach only, having all wrapped in "" ?
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    for \d which is not a valid C escape sequence, that depends on your awk implementation (run awk -v 'a=\d{3}' 'BEGIN{print a}' to check). But for \` or \b, yes definitely. (BTW, I don't know of any awk implementations that understands \d` as meaning a digit).
  • branquito
    branquito about 10 years
    it says: awk warning - escape sequence \d' treated as plain d' d{3}, so I guess I would have a problem in this case?
  • branquito
    branquito about 10 years
    so to resume, I am safe with ENVIRON["pattern"] approach?
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    Yes. see there for more reading.
  • branquito
    branquito about 10 years
    Ok, again not working.. when I replace $0 ~ pattern for $0 ~ ENVIRON["pattern"], I get matches everywhere, so all my lines from infile get copied over to outfile. First one was working ok (one with $0 ~ pattern)
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    Sorry, my bad, I had a typo in my answer. The name of then environment variable has to match ENVIRON["PATTERN"] for the PATTERN environment variable. If you want to use a shell variable, you need to export it first (export variable) or use the ENV=VALUE awk '...ENVIRON["ENV"]' env-var passing syntax as in my answer.
  • branquito
    branquito about 10 years
    it works if PATTERN set each time while looping in front of awk, if I put PATTERN="$1" before the loop it does not work.. why?
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    Because you need to export a shell variable for it to be passed in the environment to a command.
  • branquito
    branquito about 10 years
    Even in same shell script, didn't know that, it works now, just changed PATTERN="$1" to export PATTERN="$1". Maybe because awk was sent input through the pipe.. Anyway learned a lot!
  • XXX
    XXX about 8 years
    It seems that this: awk '$0 ~ /foo/ { print $0 }' is actually equivalent to this: awk -v pattern=foo '$0 ~ pattern { print $0 }. In other words, the // brackets are not needed anymore at all, because pattern becomes a dynamic regexp. Is that right?
  • Angel Todorov
    Angel Todorov about 8 years
    Yes that's right.
  • A S
    A S over 2 years
    But what if $pattern (and therefore search) contains something that can be treated as a part of regex, like [? And I want to search for a literal match?