What is launching all of these rpc.statd processes?

12,731

Well, this appears to be partly our fault and partly a bug in RedHat's authconfig command. Our Puppet configuration was causing authconfig --updateall to be run every hour. This was unnecessary but generally it shouldn't be a problem...except that authconfig restarts the rpcbind service.

Restart rpcbind causes it to forget about all the services that have registered with it. While authconfig will then restart NIS-related services, this results in a situation where rpc.statd is still running but no longer registered with rpcbind -- which makes it effectively invisible from the point of view of applications that attempt to find it via rpcbind.

I've fixed our Puppet configuration so that it is no longer calling authconfig like this, and I've opened bug 818246 with RedHat.

Share:
12,731

Related videos on Youtube

user2751502
Author by

user2751502

Updated on September 18, 2022

Comments

  • user2751502
    user2751502 over 1 year

    On of our servers -- running CentOS 6 x86_64 -- we're seeing a lot unusual activity with rpc.statd. We have rpc.statd configured to run on a static port via /etc/sysconfig/nfs:

    MOUNTD_PORT=892
    STATD_PORT=662
    QUOTAD_PORT=875
    

    And this does result in rpc.statd running and listening on this port as expected:

    # ps -fe | grep rpc.statd | grep 662
    rpcuser  23129     1  0 Apr30 ?        00:00:00 rpc.statd -p 662
    

    What's odd is that on this system, there are also numerous other rpc.statd instances running with the --no-notify flag:

    rpcuser    808     1  0 02:23 ?        00:00:00 rpc.statd --no-notify
    rpcuser   2052     1  0 07:17 ?        00:00:00 rpc.statd --no-notify
    rpcuser   3558     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser   5787     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser   6499     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser   8834     1  0 03:21 ?        00:00:00 rpc.statd --no-notify
    rpcuser   9661     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser  13702     1  0 00:08 ?        00:00:00 rpc.statd --no-notify
    rpcuser  14813     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser  15375     1  0 08:39 ?        00:00:00 rpc.statd --no-notify
    rpcuser  15376     1  0 04:26 ?        00:00:00 rpc.statd --no-notify
    rpcuser  19782     1  0 09:36 ?        00:00:00 rpc.statd --no-notify
    rpcuser  20491     1  0 05:36 ?        00:00:00 rpc.statd --no-notify
    rpcuser  23136     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser  23320     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser  26145     1  0 10:10 ?        00:00:00 rpc.statd --no-notify
    rpcuser  26480     1  0 06:24 ?        00:00:00 rpc.statd --no-notify
    rpcuser  26598     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    rpcuser  26821     1  0 01:15 ?        00:00:00 rpc.statd --no-notify
    rpcuser  28255     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
    

    Also odd is that one of these processes has apparently usurped the original rpc.statd process as far as rpcbind is concerned. Running rpcinfo reports statd on the following ports:

    # rpcinfo -p
    ...
    100024    1   udp  34322  status
    100024    1   tcp  41686  status
    

    These correspond to PID 26145 (which you can see is one of the rpc.statd instances in the above output from ps).

    This wouldn't be a problem if everything is working, but yesterday the system began to experience a problem with NFS mounts...any attempt to mount a new filesystem would result in:

    mount.nfs: mount system call failed
    

    Killing off all the rpc.statd services "resolved" the problem, but we're puzzled as to what's going on here. We've never seen this behavior on our similarly configured CentOS 5 systems.

    • Zachw6
      Zachw6 about 12 years
      Since rpc.statd is started by mount.nfs, this could be a result of many mount attempts after nfs hickups. Anything in the logs with a matching STIME?
  • user2751502
    user2751502 over 11 years
    ...which RedHat has, of course, ignored. C'est la vie.