Trouble getting awk output in for loop

bash shell text-processing awk

5,119

Solution 1

There are several problems with your script. I've listed the ones I found, but I haven't tested, there may be others.

for keyurl in $DATA; do … splits $DATA at each whitespace, not at each newline. So in the first iteration, $DATA will be Example; then Domains;www.example.com, and so on. Furthermore, each value undergoes wildcard expansion, so if there is a * in a keyword, you might see funky results depending on the files present in the current directory.

You're tring to process newline-separated data. A simple way is

while read -r keyurl; do
  …
done <testurls

This strips the indentation from each line, which is probably not a bad thing here. (Use IFS= read -r keyurl if you want keyurl to contain each line exactly.)

Your calls to awk aren't working because you're passing $keyurl as a file name. You need to pass it as input instead. While you're at it, always use double quotes around variable substitutions (otherwise the shell performs some expansions on their value). I also recommend using $(…) instead of `…`; they're the same, exept that `…` is difficult to use when you want to quote things inside, whereas the syntax of $(…) is intuitive.

keyword=`echo "$keyurl" | awk -F ";" '{print $1}'`
url=`echo "$keyurl" | awk -F ";" '{print $2}'`

There's a better way to split a variable at the first semicolon: use the shell's built-in constructs to strip a prefix or suffix from a string.

keyword=${keyurl%%;*} url=${keyurl#*;}

But since your data comes from the read built-in and the separator is a single character, you can take advantage of the IFS feature and directly split your input as you read it.

while IFS=';' read -r keyword url; do …

Coming to your curl and grep calls, note that you're looking for the literal text $keyword, since you used single quotes. Use double quotes; note that the keyword will be interpreted as a basic regular expression. If you want the keyword to be interpreted as a literal string, pass the -F option to grep. You should also put -e before the pattern, in case the keyword begins with the character - (otherwise the keyword would be interpreted as an option to grep). Finally on the topic of grep, its -q option is equivalent to >/dev/null. Also remember the double quotes around $url.

curl -silent "$url" | grep -Fqe "$keyword"

You can shorten the if [ $? != 0 ]; then part by putting the command directly in there.

if curl -silent "$url" | grep -Fqe "$keyword"; then

In summary;

while IFS=';' read -r keyword url; do
  if curl -silent "$url" | grep -Fqe "$keyword"; then
    echo "Did not find $keyword on $url"
  else
    echo $url "Okay"
  fi
done

Solution 2

awk is considering value of $keyurl as data file to be processed. You need to feed value of $keyurl to awk like

keyword=`echo $keyurl | awk -F ";" '{print $1}'`

This will solve your one of your many problems.

5,119

Nandha

Updated on September 18, 2022

Comments

Nandha over 1 year

I'm trying to create a script that will check a website for a word. I have a few to check so I'm trying to input them via another file.

The file is called "testurls". In the file I list the keyword then the URL. They are separated with a semicolon.

Example Domains;www.example.com
Google;www.google.com

Here is the script:

#!/bin/bash
clear

# Call list of keywords and urls
DATA=`cat testurls`

for keyurl in $DATA
do
    keyword=`awk -F ";" '{print $1}' $keyurl`
    url=`awk -F ";" '{print $2}' $keyurl`
    curl -silent $url | grep '$keyword' > /dev/null
 if [ $? != 0 ]; then
    # Fail
        echo "Did not find $keyword on $url"
    else
    # Pass
        echo $url "Okay"
fi
done

The output is:

awk: cannot open Example (No such file or directory)
awk: cannot open Example (No such file or directory)
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
Did not find  on
awk: cannot open Domains;www.example.com (No such file or directory)
awk: cannot open Domains;www.example.com (No such file or directory)
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
Did not find  on
awk: cannot open Google;www.google.com (No such file or directory)
awk: cannot open Google;www.google.com (No such file or directory)
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
Did not find  on

I've hacked away at this for ages now. Any help is very welcome.

Gilles 'SO- stop being evil' over 12 years

This is the basic problem causing the errors, but it's not the only one.
jasonwryan over 12 years

You really should consider collecting some of your answers and publishing them in the one place. You could call it Unix: breakfast of champignons
jaypal singh over 12 years

I absolutely agree. It would be an amazing guide for beginners like me. :) +1 (Too bad I can't vote it as THE best answer)
Nandha over 12 years

+1 I was not expecting an answer like this! Thank you for taking the time to show me what I was doing wrong and explaining how to do it right (and in a good level of detail!).