"service mysqld stop" times out (then found out that "mysqld dead but subsys locked")
First things first: a very well-done, systematic and thorough debugging, good job.
On my RHEL 5.6 box I always get a return code of 1 if I try to kill a non-existing pid. I tried as both root and a non-privileged user, both with full path and a with just the command name. I also get only terse kill XXX: No such process
, with no elaborate error messages.
It may be a good idea to run rpm -Vv util-linux
and see if somebody didn't replace /bin/kill
with a new and improved version. Even if rpm verification says the file is pristine, I'd try renaming /bin/kill
and copying over a binary from a working machine. If the file replacement helps and you don't uncover a legitimate the source of the change, then regardless of output of rpm verification I'd assume the machine was compromised.
Related videos on Youtube
lamp_scaler
Updated on September 18, 2022Comments
-
lamp_scaler over 1 year
I installed mysql and server via yum on my 64bit CentOS 5 server. It starts up fine but when I try to stop it it stalls out and then I have to "Ctrl-C" it. Then I run a "service mysqld status" and it shows:
mysqld dead but subsys locked
I run ps aux and mysql is nowhere to be found. Starting mysqld again via "service mysqld start" works fine. Attempting to stop it creates the same problem.
I then realized that
/var/lock/subsys/mysqld
still exist. When running mysqld I checked/var/run/mysqld/mysqld.pid
and it matched with the pid of the running service.I tried reinstalling mysql and deleting all files and configs but to no avail.
What to do?
EDIT:
I added some echo statements in the /etc/init.d/mysqld file, particularly in the stop function:
stop(){ if [ ! -f "$mypidfile" ]; then # not running; per LSB standards this is "ok" action $"Stopping $prog: " /bin/true return 0 fi echo "beginning stop sequence" MYSQLPID=`cat "$mypidfile"` if [ -n "$MYSQLPID" ]; then /bin/kill "$MYSQLPID" >/dev/null 2>&1 echo "killing pid $MYSQLPID" ret=$? if [ $ret -eq 0 ]; then echo "return code $ret after kill attempt" TIMEOUT="$STOPTIMEOUT" echo "timeout is set to $STOPTIMEOUT" while [ $TIMEOUT -gt 0 ]; do /bin/kill -0 "$MYSQLPID" >/dev/null 2>&1 || break sleep 1 let TIMEOUT=${TIMEOUT}-1 echo "timeout is now $TIMEOUT" done if [ $TIMEOUT -eq 0 ]; then echo "Timeout error occurred trying to stop MySQL Daemon." ret=1 action $"Stopping $prog: " /bin/false else echo "attempting to del lockfile: $lockfile" rm -f $lockfile rm -f "$socketfile" action $"Stopping $prog: " /bin/true fi else action $"Stopping $prog: " /bin/false fi else # failed to read pidfile, probably insufficient permissions action $"Stopping $prog: " /bin/false ret=4 fi return $ret }
This is the result I get when I try to stop the service:
[root@server]# service mysqld stop beginning stop sequence killing pid 9145 return code 0 after kill attempt timeout is set to 60 timeout is now 59 timeout is now 58 timeout is now 57 timeout is now 56 timeout is now 55 timeout is now 54 timeout is now 53 timeout is now 52 timeout is now 51 timeout is now 50 timeout is now 49
From looking at the code it seems to me that it will never break out of that while loop and will not be able to delete the lock file. Am I interpreting this wrong? I checked the same file on my other server and it uses the same code. I am dumbfounded.
EDIT: In the while loop part
/bin/kill -0 "$MYSQLPID" >/dev/null 2>&1 || break
For some reason it is not recognizing the return code. When service mysqld stop is called, the process has already been killed but not sure why it isn't allowing the loop to break out.
EDIT: Further testing shows some weird behavior between calling
/bin/kill
and just callingkill
, they apparently return different codes, why??????:[root@server]# /bin/kill 25200 kill 25200: No such process [user@server]# echo ${?} 0 [root@server]# kill 25200 -bash: kill: (25200) - No such process [root@server]# echo ${?} 1
EDIT: I logged on as a non-root user and tried executing "kill" and "/bin/kill" with a surprising results:
[notroot@server ~]$ kill -0 23232 -bash: kill: (23232) - No such process [notroot@server ~]$ echo $? 1 [notroot@server ~]$ /bin/kill -0 23232 kill 23232: No such process (No info could be read for "-p": geteuid()=501 but you should be root.) [notroot@server ~]$ echo $? 0
The "No info could be read" error does not show up in my other servers when executing kill and bin/kill as a nonroot user.
EDIT: Added logging described by quanta and also checked out the mysql log:
After a start and stop, mysql log shows this:
110918 00:11:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 110918 0:11:28 [Note] Plugin 'FEDERATED' is disabled. 110918 0:11:28 InnoDB: Initializing buffer pool, size = 16.0M 110918 0:11:28 InnoDB: Completed initialization of buffer pool 110918 0:11:29 InnoDB: Started; log sequence number 0 44233 110918 0:11:29 [Warning] 'user' entry 'root@server' ignored in --skip-name-resolve mode. 110918 0:11:29 [Warning] 'user' entry '@server' ignored in --skip-name-resolve mode. 110918 0:11:29 [Note] Event Scheduler: Loaded 0 events 110918 0:11:29 [Note] /usr/libexec/mysqld: ready for connections. Version: '5.1.58-ius' socket: '/var/lib/mysql/mysql.sock' port: 3306 Distributed by The IUS Community Project 110918 0:11:34 [Note] /usr/libexec/mysqld: Normal shutdown 110918 0:11:34 [Note] Event Scheduler: Purging the queue. 0 events 110918 0:11:34 InnoDB: Starting shutdown... 110918 0:11:39 InnoDB: Shutdown completed; log sequence number 0 44233 110918 0:11:39 [Note] /usr/libexec/mysqld: Shutdown complete 110918 00:11:39 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
Then in the tmp/mysql.log:
kill 23080: No such process kill 23080: No such process kill 23080: No such process kill 23080: No such process kill 23080: No such process kill 23080: No such process kill 23080: No such process kill 23080: No such process kill 23080: No such process kill 23080: No such process
I stopped the stop process midway so I don't have to wait for the timeout. Looks like the process was killed. The problem, I think, is still with the different return codes from
"kill"
and"/bin/kill"
-
Greg Petersen over 12 yearsReplace
>/dev/null
at the first attempt to kill with>/tmp/mysql.log
and the second in the while loop with>>/tmp/mysql.log
and try again to see what the logs say. Also take a look at/var/log/mysqld.log
. -
lamp_scaler over 12 yearsAdded the results above. I still believe something weird is happening with the kill and /bin/kill not returning the same exit codes.
-
Greg Petersen over 12 yearsThe first attempt returns 0 because you had switched to normal user from root. Are you always get return status 0 when trying
/bin/kill
with normal user? What is the output when running in debug modesh -x /etc/init.d/mysqld stop
? -
lamp_scaler over 12 yearsAs you see from one of my edits above, running /bin/kill as a non-root user returns this error: (No info could be read for "-p": geteuid()=501 but you should be root.). But this does not happen in my other server when I run /bin/kill as a non-root user.
-
fxmtor over 12 yearsBash has a builtin command
kill
. This is helpful in case the system is swamped enough that it can't spawn any more processes, then you can still kill rogue processes if you're logged in. Thus, when you runkill
(as opposed to/bin/kill
), then you're running the builtin. This should explain the difference between runningkill
and/bin/kill
, where you call a specific binary.
-
-
lamp_scaler over 12 yearsLooks like something is up with /bin/kill:.
S.5....T /bin/kill
-
lamp_scaler over 12 yearsDid a quick
yum update util-linux
and all is well now. You, sir, are a gentleman AND a scholar. -
Paweł Brodacki over 12 yearsThank you for your courteous words. I think
rpm -aV
may be in order.util-linux
may be not the only affected package, and whoever changed thekill
binary doesn't seem to manipulate rpm's base as well.