apache2 hanging on "Sending Reply"/"Gracefully Finishing"

13,529

After testing, I think I found the problem:

It was (as already expected) a Problem in my Perl Code.

I used Log4Perl. In my Log4Perl file there was a setting to send a mail on error. After removing this setting all of the perl instances terminated successfully and there aren't any hanging apache processes anymore.

Share:
13,529
Stefan Profanter
Author by

Stefan Profanter

Updated on September 18, 2022

Comments

  • Stefan Profanter
    Stefan Profanter almost 2 years

    On my Ubuntu 10.04.4 server, apache2 2.2.14 eats up all the memory until there is no more left because it isn't shutting down old instances.

    The server serves a small webpage which gets about one request each 3 seconds or so (from clients, google bot, scripts, etc).

    If I look into the process list, there are a lot of apache2 instances:

    ps aux | grep apache2

    www-data  1324  0.0  6.4 580688 135600 ?       Sl   07:44   0:00 /usr/sbin/apache2 -k start
    www-data  1407  0.0  6.2 518800 131748 ?       Sl   07:49   0:00 /usr/sbin/apache2 -k start
    www-data  1464  0.0  6.0 518800 127740 ?       S    07:52   0:00 /usr/sbin/apache2 -k start
    www-data  1492  0.0  6.1 580688 129852 ?       S    07:56   0:00 /usr/sbin/apache2 -k start
    www-data  2043  0.0  6.2 501384 130864 ?       Sl   08:46   0:00 /usr/sbin/apache2 -k start
    www-data  3207  0.0  6.1 501384 128260 ?       S    08:57   0:00 /usr/sbin/apache2 -k start
    www-data  5711  0.0  6.2 509580 130888 ?       Sl   10:48   0:00 /usr/sbin/apache2 -k start
    www-data  5779  0.0  6.0 509580 126120 ?       S    10:52   0:00 /usr/sbin/apache2 -k start
    www-data  5796  0.0  6.1 491140 128660 ?       Sl   10:54   0:00 /usr/sbin/apache2 -k start
    www-data  5878  0.0  6.2 510604 130144 ?       Sl   10:59   0:00 /usr/sbin/apache2 -k start
    www-data  7854  0.0  6.1 492556 129712 ?       Sl   11:55   0:00 /usr/sbin/apache2 -k start
    www-data  7925  0.0  5.9 492556 125592 ?       S    12:00   0:00 /usr/sbin/apache2 -k start
    www-data  8018  0.0  6.3 519204 132132 ?       Sl   12:07   0:00 /usr/sbin/apache2 -k start
    www-data  8019  0.0  6.5 512448 136388 ?       Sl   12:07   0:00 /usr/sbin/apache2 -k start
    www-data  8020  0.0  5.8 484340 122532 ?       S    12:07   0:00 /usr/sbin/apache2 -k start
    root      8159  0.0  0.0   7644   908 ttyp0    R+   12:16   0:00 grep --color=auto apache2
    root     23949  0.0  6.3 483316 132176 ?       Ss   00:16   0:15 /usr/sbin/apache2 -k start
    root     23951  0.0  0.0  16956   512 ?        Ss   00:16   0:00 /usr/sbin/htcacheclean -n -d120 -i -p/var/cache/apache2/mod_disk_cache -l300M
    www-data 24064  0.0  6.6 590932 138868 ?       Sl   00:20   0:01 /usr/sbin/apache2 -k start
    www-data 24065  0.0  6.5 533592 137812 ?       Sl   00:20   0:00 /usr/sbin/apache2 -k start
    www-data 24068  0.0  6.3 500408 132372 ?       Sl   00:20   0:00 /usr/sbin/apache2 -k start
    www-data 24357  0.0  6.5 525444 138352 ?       Sl   00:47   0:00 /usr/sbin/apache2 -k start
    www-data 24389  0.0  6.3 533592 133604 ?       S    00:50   0:00 /usr/sbin/apache2 -k start
    www-data 24527  0.0  6.5 525396 137028 ?       Sl   01:06   0:00 /usr/sbin/apache2 -k start
    www-data 26006  0.0  6.3 525396 132996 ?       S    01:48   0:00 /usr/sbin/apache2 -k start
    www-data 26023  0.0  6.3 525396 133952 ?       S    01:50   0:00 /usr/sbin/apache2 -k start
    www-data 26304  0.0  6.6 528468 140504 ?       Sl   02:18   0:00 /usr/sbin/apache2 -k start
    www-data 26305  0.0  6.4 523348 135396 ?       Sl   02:18   0:00 /usr/sbin/apache2 -k start
    www-data 27902  0.0  6.4 590932 134240 ?       S    02:49   0:00 /usr/sbin/apache2 -k start
    www-data 27912  0.0  6.2 523348 130988 ?       S    02:51   0:00 /usr/sbin/apache2 -k start
    www-data 27942  0.0  6.4 523348 134744 ?       Sl   02:54   0:00 /usr/sbin/apache2 -k start
    www-data 28508  0.0  6.3 523348 132292 ?       S    03:51   0:00 /usr/sbin/apache2 -k start
    www-data 28518  0.0  6.4 528468 134736 ?       S    03:52   0:00 /usr/sbin/apache2 -k start
    www-data 29919  0.0  6.3 518800 132240 ?       Sl   04:28   0:00 /usr/sbin/apache2 -k start
    www-data 30164  0.0  6.1 518800 128652 ?       S    04:52   0:00 /usr/sbin/apache2 -k start
    www-data 30168  0.0  6.1 518800 129576 ?       S    04:53   0:00 /usr/sbin/apache2 -k start
    www-data 30418  0.0  6.7 528468 140668 ?       Sl   05:17   0:00 /usr/sbin/apache2 -k start
    www-data 30463  0.0  6.3 518800 132128 ?       Sl   05:21   0:00 /usr/sbin/apache2 -k start
    www-data 31799  0.0  6.1 518800 129268 ?       S    05:52   0:00 /usr/sbin/apache2 -k start
    www-data 31813  0.0  6.4 528468 134808 ?       S    05:54   0:00 /usr/sbin/apache2 -k start
    www-data 31881  0.0  6.4 515152 135252 ?       Sl   06:00   0:00 /usr/sbin/apache2 -k start
    www-data 32005  0.0  6.2 517776 131220 ?       Sl   06:12   0:00 /usr/sbin/apache2 -k start
    www-data 32405  0.0  6.0 517776 127832 ?       S    06:52   0:00 /usr/sbin/apache2 -k start
    www-data 32435  0.0  6.1 515152 129456 ?       S    06:55   0:00 /usr/sbin/apache2 -k start
    

    Apache server status (ExtendedStatus OFF) gives:

    Server Version: Apache/2.2.14 (Ubuntu) mod_ssl/2.2.14 OpenSSL/0.9.8k mod_perl/2.0.4 Perl/v5.10.1
    Server Built: Mar 8 2013 16:46:35
    
    Current Time: Wednesday, 05-Jun-2013 12:17:22 CEST
    Restart Time: Wednesday, 05-Jun-2013 00:16:19 CEST
    Parent Server Generation: 2
    Server uptime: 12 hours 1 minute 3 seconds
    19 requests currently being processed, 2 idle workers
    
    GGGGGGGGGGGGGG.GWG_G_..W........................................
    ................................................................
    ................................................................
    ................................................................
    
    Scoreboard Key:
    "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
    "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
    "C" Closing connection, "L" Logging, "G" Gracefully finishing,
    "I" Idle cleanup of worker, "." Open slot with no current process
    
    PID Key:
    
       24527 in state: G ,   27942 in state: G ,   5796 in state: G 
       26304 in state: G ,   26305 in state: G ,   30418 in state: G 
       24064 in state: G ,   24065 in state: G ,   31881 in state: G 
       32005 in state: G ,   24068 in state: G ,   29919 in state: G 
       1324 in state: G ,   30463 in state: G ,   5711 in state: G 
       8018 in state: W ,   1407 in state: G ,   8019 in state: _ 
       2043 in state: G ,   8020 in state: _ ,   7854 in state: W 
    
    To obtain a full report with current status information you need to use the ExtendedStatus On directive.
    SSL/TLS Session Cache Status:
    cache type: SHMCB, shared memory: 512000 bytes, current sessions: 0
    subcaches: 32, indexes per subcache: 133
    index usage: 0%, cache usage: 0%
    total sessions stored since starting: 4
    total sessions expired since starting: 4
    total (pre-expiry) sessions scrolled out of the cache: 0
    total retrieves since starting: 0 hit, 0 miss
    total removes since starting: 0 hit, 0 miss
    

    So you can see that most of the processes hang in the G status (Gracefully finishing).

    Update 1: Using apache server status with "ExtendedStatus On" we see that the processes are stuck in the W status (Sending reply): (I replaced the true IPs with ipX.domain.com)

    Server Version: Apache/2.2.14 (Ubuntu) mod_ssl/2.2.14 OpenSSL/0.9.8k mod_perl/2.0.4 Perl/v5.10.1
    Server Built: Mar 8 2013 16:46:35
    
    Current Time: Wednesday, 05-Jun-2013 18:49:48 CEST
    Restart Time: Wednesday, 05-Jun-2013 14:31:17 CEST
    Parent Server Generation: 0
    Server uptime: 4 hours 18 minutes 31 seconds
    Total accesses: 3129 - Total Traffic: 111.9 MB
    CPU Usage: u2.29 s.18 cu0 cs0 - .0159% CPU load
    .202 requests/sec - 7.4 kB/second - 36.6 kB/request
    6 requests currently being processed, 2 idle workers
    
    WWW.W._WW..._...................................................
    ................................................................
    ................................................................
    ................................................................
    
    Scoreboard Key:
    "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
    "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
    "C" Closing connection, "L" Logging, "G" Gracefully finishing,
    "I" Idle cleanup of worker, "." Open slot with no current process
    
    Srv     PID     Acc         M   CPU     SS      Req     Conn    Child   Slot    Client          VHost           Request
    0-0     7926    1/28/390    W   0.00    2913    0       0.2     1.15    11.63   ip1.domain.com  www.domain.com  GET /de HTTP/1.1
    1-0     5432    0/10/183    W   0.00    10005   0       0.0     0.59    7.51    ip6.domain.com  def.domain.com  HEAD / HTTP/1.1
    2-0     7878    1/44/403    W   0.00    2710    0       0.2     2.29    13.22   ip1.domain.com  www.domain.com  GET /de HTTP/1.1
    3-0     -       0/0/467     .   0.00    203     0       0.0     0.00    19.51   ip2.domain.com  def.domain.com  NULL
    4-0     5854    1/58/223    W   0.01    6137    0       0.2     1.74    12.29   ip1.domain.com  www.domain.com  GET /de HTTP/1.1
    5-0     -       0/0/359     .   1.40    204     0       0.0     0.00    10.42   ip2.domain.com  def.domain.com  NULL
    6-0     9534    0/60/244    _   0.63    69      0       0.0     1.60    6.70    ip1.domain.com  def.domain.com  NULL
    7-0     5861    1/32/84     W   0.00    6746    0       0.2     1.55    3.18    ip1.domain.com  www.domain.com  GET /de HTTP/1.1
    8-0     11401   98/106/440  W   0.17    0       0       189.2   0.38    15.68   ip5.domain.com  def.domain.com  GET /server-status HTTP/1.1
    9-0     -       0/0/152     .   0.00    207     1608    0.0     0.00    8.06    ip2.domain.com  def.domain.com  NULL
    10-0    -       0/0/31      .   0.00    206     1639    0.0     0.00    0.79    ip2.domain.com  def.domain.com  NULL
    11-0    -       0/0/99      .   0.24    205     0       0.0     0.00    1.83    ip2.domain.com  def.domain.com  NULL
    12-0    11404   0/4/24      _   0.02    104     18      0.0     0.00    0.38    ip3.domain.com  www.domain.com  GET /robots.txt HTTP/1.1
    13-0    -       0/0/8       .   0.00    208     0       0.0     0.00    0.06    ip1.domain.com  www.domain.com  GET /de HTTP/1.1
    14-0    -       0/0/16      .   0.00    920     47      0.0     0.00    0.59    ip4.domain.com  def.domain.com  NULL
    15-0    -       0/0/2       .   0.00    5661    3905    0.0     0.00    0.08    ip2.domain.com  def.domain.com  NULL
    16-0    -       0/0/3       .   0.00    5686    0       0.0     0.00    0.00    ip2.domain.com  def.domain.com  NULL
    17-0    -       0/0/1       .   0.00    5694    1       0.0     0.00    0.00    127.0.0.1       def.domain.com  GET / HTTP/1.0
    Srv Child Server number - generation
    PID OS process ID
    Acc Number of accesses this connection / this child / this slot
    M   Mode of operation
    CPU CPU usage, number of seconds
    SS  Seconds since beginning of most recent request
    Req Milliseconds required to process most recent request
    Conn    Kilobytes transferred this connection
    Child   Megabytes transferred this child
    Slot    Total megabytes transferred this slot
    SSL/TLS Session Cache Status:
    cache type: SHMCB, shared memory: 512000 bytes, current sessions: 0
    subcaches: 32, indexes per subcache: 133
    index usage: 0%, cache usage: 0%
    total sessions stored since starting: 37
    total sessions expired since starting: 37
    total (pre-expiry) sessions scrolled out of the cache: 0
    total retrieves since starting: 8 hit, 0 miss
    total removes since starting: 0 hit, 0 miss
    

    As you can see here, there are a few processes with a very high SS (Time since last request) value which means they are stuck. The requests were for different VHosts and also from different clients.

    If I look into the acces.log files and search for the entries for such a process (I added the PID to log), the last request they served was most of the times a simple image file directly elaborated by apache.

    The important parts of the apache2 config are:

    <IfModule mpm_prefork_module>
        StartServers       1
        MinSpareServers    1
        MaxSpareServers    3
        MaxClients         30
        MaxRequestsPerChild   0
    </IfModule>
    KeepAlive On
    MaxKeepAliveRequests 100
    KeepAliveTimeout 15
    

    Update 3: All the enabled apache2 modules:

    root@ser:~# apache2ctl -M
    Loaded Modules:
     core_module (static)
     log_config_module (static)
     logio_module (static)
     mpm_prefork_module (static)
     http_module (static)
     so_module (static)
     alias_module (shared)
     auth_basic_module (shared)
     authn_file_module (shared)
     authz_default_module (shared)
     authz_groupfile_module (shared)
     authz_host_module (shared)
     authz_user_module (shared)
     autoindex_module (shared)
     cache_module (shared)
     cgi_module (shared)
     deflate_module (shared)
     dir_module (shared)
     disk_cache_module (shared)
     env_module (shared)
     expires_module (shared)
     filter_module (shared)
     headers_module (shared)
     mem_cache_module (shared)
     mime_module (shared)
     negotiation_module (shared)
     version_module (shared)
     pagespeed_module (shared)
     perl_module (shared)
     php5_module (shared)
     proxy_module (shared)
     proxy_http_module (shared)
     reqtimeout_module (shared)
     rewrite_module (shared)
     setenvif_module (shared)
     ssl_module (shared)
     status_module (shared)
     unique_id_module (shared)
    Syntax OK
    

    Update 2: Here is the stacktrace of a hanging process. All of the stacktraces of the stuck ones are exactly the same:

    5861:/usr/sbin/apache2-kstart
    #0  0x00002ad420a4448d in /lib/libpthread-2.11.1.so: waitpid
    #1  0x00002ad4258dc507 in /usr/lib/libperl.so.5.10.1: Perl_wait4pid
    #2  0x00002ad4258dc760 in /usr/lib/libperl.so.5.10.1: Perl_my_pclose
    #3  0x00002ad425949b1d in /usr/lib/libperl.so.5.10.1: Perl_io_close
    #4  0x00002ad425949c89 in /usr/lib/libperl.so.5.10.1: Perl_do_close
    #5  0x00002ad42593d922 in /usr/lib/libperl.so.5.10.1: Perl_pp_close
    #6  0x00002ad4258f3346 in /usr/lib/libperl.so.5.10.1: Perl_runops_standard
    #7  0x00002ad42589a8cf in /usr/lib/libperl.so.5.10.1: Perl_call_sv
    #8  0x00002ad4256202f6 in /usr/lib/apache2/modules/mod_perl.so: modperl_callback
    #9  0x00002ad425620a3a in /usr/lib/apache2/modules/mod_perl.so: modperl_callback_run_handlers
    #10 0x00002ad42562101f in /usr/lib/apache2/modules/mod_perl.so: modperl_callback_per_dir
    #11 0x00002ad42561a99f in /usr/lib/apache2/modules/mod_perl.so: -
    #12 0x00002ad42561ad06 in /usr/lib/apache2/modules/mod_perl.so: modperl_response_handler
    #13 0x00002ad41ff58320 in /usr/lib/apache2/mpm-prefork/apache2: ap_run_handler (/build/buildd/apache2-2.2.14/server/config.c:159)
    #14 0x00002ad41ff5bc88 in /usr/lib/apache2/mpm-prefork/apache2: ap_invoke_handler (/build/buildd/apache2-2.2.14/server/config.c:377)
    #15 0x00002ad41ff69878 in /usr/lib/apache2/mpm-prefork/apache2: ap_process_request (/build/buildd/apache2-2.2.14/modules/http/http_request.c:282)
    #16 0x00002ad41ff66728 in /usr/lib/apache2/mpm-prefork/apache2: ap_process_http_connection (/build/buildd/apache2-2.2.14/modules/http/http_core.c:190)
    #17 0x00002ad41ff5fed8 in /usr/lib/apache2/mpm-prefork/apache2: ap_run_process_connection (/build/buildd/apache2-2.2.14/server/connection.c:43)
    #18 0x00002ad41ff6e847 in /usr/lib/apache2/mpm-prefork/apache2: child_main (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:663)
    #19 0x00002ad41ff6eb5a in /usr/lib/apache2/mpm-prefork/apache2: make_child (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:758)
    #20 0x00002ad41ff6f7e4 in /usr/lib/apache2/mpm-prefork/apache2: perform_idle_server_maintenance (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:883)
    #21 0x00002ad41ff44350 in /usr/lib/apache2/mpm-prefork/apache2: main (/build/buildd/apache2-2.2.14/server/main.c:742)
    #22 0x00002ad420c70c4d in /lib/libc-2.11.1.so: __libc_start_main@@GLIBC_2.2.5
    #23 0x00002ad41ff432e9 in /usr/lib/apache2/mpm-prefork/apache2: _start
    

    For comaprison the stacktrace of a process which doesn't hang:

    9741:/usr/sbin/apache2-kstart
    #0  0x00002ad420d3d7d7 in /lib/libc-2.11.1.so: semop@@GLIBC_2.2.5
    #1  0x00002ad42081a0c3 in /usr/lib/libapr-1.so.0.3.8: -
    #2  0x00002ad42081ae47 in /usr/lib/libapr-1.so.0.3.8: apr_proc_mutex_lock
    #3  0x00002ad41ff6e754 in /usr/lib/apache2/mpm-prefork/apache2: accept_mutex_on (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:206)
    #4  0x00002ad41ff6eb5a in /usr/lib/apache2/mpm-prefork/apache2: make_child (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:758)
    #5  0x00002ad41ff6f7e4 in /usr/lib/apache2/mpm-prefork/apache2: perform_idle_server_maintenance (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:883)
    #6  0x00002ad41ff44350 in /usr/lib/apache2/mpm-prefork/apache2: main (/build/buildd/apache2-2.2.14/server/main.c:742)
    #7  0x00002ad420c70c4d in /lib/libc-2.11.1.so: __libc_start_main@@GLIBC_2.2.5
    #8  0x00002ad41ff432e9 in /usr/lib/apache2/mpm-prefork/apache2: _start
    

    Here my Question:

    How can I find out what blocks the apache instances from shutting down successfully?

    Update 4 Googling for 'apache sending reply hung' results in a few posts where people had problems with buggy scripts/mysql connections. As it can be seen in the stacktrace, the processes which got stuck did use mod_perl, maybe there's a problem... Will dig into it

    • Michael Hampton
      Michael Hampton about 11 years
      Your stack trace shows that your mod_perl code is hanging, waiting for another process. It's time to look at your web application.
  • Stefan Profanter
    Stefan Profanter about 11 years
    Thanks for your response! The logs are written by apache directly, using the directive: ErrorLog /var/log/apache2/page-error.log and: CustomLog /var/log/apache2/page-access.log combined The server is a virtual host on a hosting provider. Thus I think the disk partition should be fine. Will give strace a shot and post it's output in my initial post.
  • Kiran K Telukunta
    Kiran K Telukunta about 7 years
    How did you find that error?
  • Stefan Profanter
    Stefan Profanter over 6 years
    Good question, by trial and error, i.e., disabling feature by feature within perl until I could narrow the problem down.