Why do I get different exit status for ps | grep in a script?

5,824

Solution 1

When you run a script named check_varnish_pro.sh the test

ps ax  | grep -q [v]arnish

is successful because there is a script named check_varnish_pro running.

Solution 2

In general, it's a bad idea to try the simple approach with ps and grep to try to determine if a given process is running.

You would be much better off using pgrep for this:

if pgrep "varnish" >/dev/null; then
  echo "Varnish in running"
else
  echo "Varnish is not running"
fi

See the manual for pgrep. On some systems (probably not on Linux), you get a -q flag that corresponds to the same flag for grep which gets rid of the need to redirect to /dev/null. There's also a -f flag that performs the match on the full command line rather than on only the process name. One may also limit the match to processes belonging to a specific user using -u.

Installing pgrep also gives you access to pkill which allows you to signal processes based on their names.

Also, if this is a service daemon, and if your Unix system has a way of querying it for information (e.g., whether it's up and running or not), then that is the proper way of checking on it.

On Linux, you have systemctl (systemctl is-active --quiet varnish will return 0 if it's running, 3 otherwise), on OpenBSD you have rcctl, etc.


Now to your script:

In your script, you parse the output from ps ax. This output will contain the name of the script itself, check_varnish_pro.sh, which obviously contains the string varnish. This gives you a false positive. You would have spotted this if you had run it without the -q flag for grep while testing.

#!/bin/bash
ps ax | grep '[v]arnish'

Running it:

$ ./check_varnish_pro.sh
31004 p1  SN+     0:00.04 /bin/bash ./check_varnish_pro.sh

Another issue is that although you try to "hide" the grep process from being detected by grep itself by using [v] in the pattern. That approach will fail if you happen to run the script or the command line in a directory that has a file or directory named varnish in it (in which case you will get a false positive, again). This is because the pattern is unquoted and the shell will perform filename globbing with it.

See:

bash-4.4$ set -x
bash-4.4$ ps ax | grep [v]arnish
+ ps ax
+ grep '[v]arnish'
bash-4.4$ touch varnish
+ touch varnish
bash-4.4$ ps ax | grep [v]arnish
+ ps ax
+ grep varnish
91829 p2  SN+p    0:00.02 grep varnish

The presence of the file varnish will cause the shell to replace [v]arnish with the filename varnish and you get a hit on the pattern in the process table (the grep process).

Solution 3

@AlexP explains very succinctly what is actually happening, but @Kusalananda's idea of using pgrep/pkill for a critical process is strongly discouraged. Better solutions include:

  • Asking the service whether it's running. systemctl status varnishd should take care of that on a modern *nix installation.
  • If by some unfortunate circumstance you don't have a service available you can simply change the startup script to report the problem as soon as the process exits:

    varnish || true
    some_command_to_send_an_alert_that_the_service_has_died
    
  • Alternatively change the script that starts the service to record the PID, and then check the state periodically with kill -0 "$pid".
Share:
5,824

Related videos on Youtube

prado
Author by

prado

Updated on September 18, 2022

Comments

  • prado
    prado almost 2 years

    I am running below script :

    #!/bin/bash
    
    ps ax  | grep -q [v]arnish
    if [ $? -eq 0 ];then
            echo varnish is running...
            exit 0
    else
            echo "Critical : varnish is not running "
            exit 2
    fi
    

    The output is like ::

    [root@server ~]# sh -x check_varnish_pro.sh
    + ps ax
    + grep -q '[v]arnish'
    + '[' 0 -eq 0 ']'
    + echo varnish is running...
    varnish is running...
    + exit 0
    

    When I run same in command line I am getting exit status as 1:

    [root@server ~]# ps ax  | grep -q [v]arnish; echo $?
    1
    

    The case is like varnish is not installed in the server. This script works fine in a server where varnish is installed.

    Why different exit status when run using script and command line? How to improve this script?

  • z atef
    z atef over 7 years
    because everything is a file "in linux land".
  • Kusalananda
    Kusalananda over 7 years
    @z_- Not quite sure how it's connected, but this holds true even on non-Linux Unices.
  • TNW
    TNW over 7 years
    Not only the grep process; script being named check_varnish_pro.sh is also a factor.
  • Kusalananda
    Kusalananda over 7 years
    @TNW I didn't spot that at first, but you are right. I'll add that in.
  • Kusalananda
    Kusalananda over 7 years
    I agree, I was just addressing the shell scripting aspects of the problem. Note that systemctl is almost only available on Linux though (AFAIK), and not on all modern Unix-like systems.
  • l0b0
    l0b0 over 7 years
    The original question had the tag "linux"; I'm not sure why that was removed by @muru.
  • prado
    prado over 7 years
    Thanks l0b0. I had two questions "Why" and "How to improve". @AlexP's answer solved my first question and your answer is better solution for the second question. But Kusalananda explains things related to this which I think will be helpful for people who have similar issues. So I am confused now which to accept as answer.
  • l0b0
    l0b0 over 7 years