Why do I get different exit status for ps | grep in a script?
Solution 1
When you run a script named check_varnish_pro.sh
the test
ps ax | grep -q [v]arnish
is successful because there is a script named check_
varnish_pro
running.
Solution 2
In general, it's a bad idea to try the simple approach with ps
and grep
to try to determine if a given process is running.
You would be much better off using pgrep
for this:
if pgrep "varnish" >/dev/null; then
echo "Varnish in running"
else
echo "Varnish is not running"
fi
See the manual for pgrep
. On some systems (probably not on Linux), you get a -q
flag that corresponds to the same flag for grep
which gets rid of the need to redirect to /dev/null
. There's also a -f
flag that performs the match on the full command line rather than on only the process name. One may also limit the match to processes belonging to a specific user using -u
.
Installing pgrep
also gives you access to pkill
which allows you to signal processes based on their names.
Also, if this is a service daemon, and if your Unix system has a way of querying it for information (e.g., whether it's up and running or not), then that is the proper way of checking on it.
On Linux, you have systemctl
(systemctl is-active --quiet varnish
will return 0 if it's running, 3 otherwise), on OpenBSD you have rcctl
, etc.
Now to your script:
In your script, you parse the output from ps ax
. This output will contain the name of the script itself, check_varnish_pro.sh
, which obviously contains the string varnish
. This gives you a false positive. You would have spotted this if you had run it without the -q
flag for grep
while testing.
#!/bin/bash
ps ax | grep '[v]arnish'
Running it:
$ ./check_varnish_pro.sh
31004 p1 SN+ 0:00.04 /bin/bash ./check_varnish_pro.sh
Another issue is that although you try to "hide" the grep
process from being detected by grep
itself by using [v]
in the pattern. That approach will fail if you happen to run the script or the command line in a directory that has a file or directory named varnish
in it (in which case you will get a false positive, again). This is because the pattern is unquoted and the shell will perform filename globbing with it.
See:
bash-4.4$ set -x
bash-4.4$ ps ax | grep [v]arnish
+ ps ax
+ grep '[v]arnish'
bash-4.4$ touch varnish
+ touch varnish
bash-4.4$ ps ax | grep [v]arnish
+ ps ax
+ grep varnish
91829 p2 SN+p 0:00.02 grep varnish
The presence of the file varnish
will cause the shell to replace [v]arnish
with the filename varnish
and you get a hit on the pattern in the process table (the grep
process).
Solution 3
@AlexP explains very succinctly what is actually happening, but @Kusalananda's idea of using pgrep
/pkill
for a critical process is strongly discouraged. Better solutions include:
- Asking the service whether it's running.
systemctl status varnishd
should take care of that on a modern *nix installation. -
If by some unfortunate circumstance you don't have a service available you can simply change the startup script to report the problem as soon as the process exits:
varnish || true some_command_to_send_an_alert_that_the_service_has_died
- Alternatively change the script that starts the service to record the PID, and then check the state periodically with
kill -0 "$pid"
.
Related videos on Youtube
![prado](https://i.stack.imgur.com/e4FCg.png?s=256&g=1)
prado
Updated on September 18, 2022Comments
-
prado almost 2 years
I am running below script :
#!/bin/bash ps ax | grep -q [v]arnish if [ $? -eq 0 ];then echo varnish is running... exit 0 else echo "Critical : varnish is not running " exit 2 fi
The output is like ::
[root@server ~]# sh -x check_varnish_pro.sh + ps ax + grep -q '[v]arnish' + '[' 0 -eq 0 ']' + echo varnish is running... varnish is running... + exit 0
When I run same in command line I am getting exit status as 1:
[root@server ~]# ps ax | grep -q [v]arnish; echo $? 1
The case is like varnish is not installed in the server. This script works fine in a server where varnish is installed.
Why different exit status when run using script and command line? How to improve this script?
-
z atef over 7 yearsbecause everything is a file "in linux land".
-
Kusalananda over 7 years@z_- Not quite sure how it's connected, but this holds true even on non-Linux Unices.
-
TNW over 7 yearsNot only the grep process; script being named
check_varnish_pro.sh
is also a factor. -
Kusalananda over 7 years@TNW I didn't spot that at first, but you are right. I'll add that in.
-
Kusalananda over 7 yearsI agree, I was just addressing the shell scripting aspects of the problem. Note that
systemctl
is almost only available on Linux though (AFAIK), and not on all modern Unix-like systems. -
l0b0 over 7 yearsThe original question had the tag "linux"; I'm not sure why that was removed by @muru.
-
prado over 7 yearsThanks l0b0. I had two questions "Why" and "How to improve". @AlexP's answer solved my first question and your answer is better solution for the second question. But Kusalananda explains things related to this which I think will be helpful for people who have similar issues. So I am confused now which to accept as answer.
-
l0b0 over 7 years