Authentication very slow (both with ssh and su)

40,759

Solution 1

Found the issue (thanks this post https://stackoverflow.com/questions/8413975/10-second-delay-between-login-and-shell-prompt)

The issue is in /etc/profile.d/zzzz-vamilocale.sh file, which is trying to read something from vm properties and stuck in that. removing this files solves the issue.

My way to debug this issue: 1. login as problem user 2. invoke "bash --login --verbose" 3. Find out line, which stops the executions 4. Find this line in one file from /etc/profile.d/

Solution 2

As a general way to investigate a slow authentication, check /etc/pam.conf and /etc/pam.d/su (and /etc/pam.d/sshd etc.) to see what kind of authentication the login services perform. Check the system logs to see if anything is logged (look for log entries dating from the time of the authentication).

In your case, it's the kernel logs that reveal the problem. The “page allocation failure” messages indicate that your system is out of virtual memory. Either kill some programs or increase the swap space.

Solution 3

As you get the slow down after login, technically, you need to check what can happen between login and the first shell prompt.

A custom setup of the shell could do a lot of creative things that can fail in miserable ways under some special conditions -
like at login before actually entering interactive mode,
or while DISPLAY is not yet defined.

In this case, we look for things blocking and timing out, or doing slow things -
like the cartoon-esque automounting all home directories,
of all present and past employees.
Except your own, of course.
That was mounted already.

Sounds bad?
No. It's easy to find, and funny, for most.

The time after login, up until before showing the shell, is used by any maintenance tasks, these will slow it down. But maintenance tasks run only sporadically. And, of course, they are not run all together at once, like each week on first login after Monday 8:00 - because that would be slow, right?

There happen lots of unexpected things

The point illustrated is that there may happen a lot after login, before showing the shell.

One class of problems that I have seen to be slow at login on Ubuntu is generating the message of the day - motd.

This is the text shown before the first shell prompt on login.
It may be empty, or just some simple greeting.

Take a look at ls -l /etc/update-motd.d

$ ls -l /etc/update-motd.d
total 28
-rwxr-xr-x 1 root root 1220 Jul  3  2012 00-header
-rwxr-xr-x 1 root root 1358 Jul  3  2012 10-help-text
-rwxr-xr-x 1 root root  149 Mar 19  2012 90-updates-available
-rwxr-xr-x 1 root root  147 Aug  8  2012 91-release-upgrade
-rwxr-xr-x 1 root root  142 Mar 19  2012 98-fsck-at-reboot
-rwxr-xr-x 1 root root  144 Mar 19  2012 98-reboot-required
-rwxr-xr-x 1 root root 1158 Jul  3  2012 99-footer

Especially 90-updates-available seems to be very slow from time to time in this example - so check what you find there, and possibly disable some for testing by moving them away to /etc/update-motd.d.disabled.

Share:
40,759

Related videos on Youtube

techedemic
Author by

techedemic

Updated on September 18, 2022

Comments

  • techedemic
    techedemic over 1 year

    I'm having a problem on one of our production servers where the login attempts become progressively slower as time passes. After roughly 5 days it becomes so slow that some critical processes/cron entries fail to fire correctly.

    Server information:

    Hardware: Dell R720 24 GB Memory 2 x Intel Xeon E502620 v2 Processors (24 cores total, including HyperThreading) 8 x 300GB 10K SAS Drives

    OS: Red Hat Enterprise Linux 6.5

    I experienced issues logging in via SSH and started along a long road of investigating 'red herrings'. Eventually, I noticed that even doing the following took very long:

    [someuser#hostname] su -
    Password:
    

    When running the the 'su -' command as above, it shouldn't involve SSH in any form, as I am just trying to authenticate on the box itself, right?

    This has happened 3 weeks in a row on the same box and it's the first time I noticed (and tested I might add) that the local only logins are also slow.

    When I log in via the console, it looks as follows:

    hostname login: user
    Password:              # I enter the password and hit [enter]
    

    A long time passes and then...

    [user@hostname ~] $
    

    When I log in via SSH, it looks as follows (verbosity on and doing a localhost login):

    [user@hostname ~]$ ssh -v root@localhost
    OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010
    debug1: Reading configuration data /etc/ssh/ssh_config
    debug1: Applying options for *
    debug1: Connecting to localhost [127.0.0.1] port 22.
    debug1: Connection established.
    debug1: identity file /usr/local/user/.ssh/identity type -1
    debug1: identity file /usr/local/user/.ssh/identity-cert type -1
    debug1: identity file /usr/local/user/.ssh/id_rsa type -1
    debug1: identity file /usr/local/user/.ssh/id_rsa-cert type -1
    debug1: identity file /usr/local/user/.ssh/id_dsa type -1
    debug1: identity file /usr/local/user/.ssh/id_dsa-cert type -1
    debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
    debug1: match: OpenSSH_5.3 pat OpenSSH*
    debug1: Enabling compatibility mode for protocol 2.0
    debug1: Local version string SSH-2.0-OpenSSH_5.3
    debug1: SSH2_MSG_KEXINIT sent
    debug1: SSH2_MSG_KEXINIT received
    debug1: kex: server->client aes128-ctr hmac-md5 none
    debug1: kex: client->server aes128-ctr hmac-md5 none
    debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
    debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
    The authenticity of host 'localhost (127.0.0.1)' can't be established.
    RSA key fingerprint is 1d:50:5e:a3:e4:63:d6:1d:d8:2c:85:07:95:81:c8:b6.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
    debug1: ssh_rsa_verify: signature correct
    debug1: SSH2_MSG_NEWKEYS sent
    debug1: expecting SSH2_MSG_NEWKEYS
    debug1: SSH2_MSG_NEWKEYS received
    debug1: SSH2_MSG_SERVICE_REQUEST sent
    debug1: SSH2_MSG_SERVICE_ACCEPT received
    debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
    debug1: Next authentication method: gssapi-keyex
    debug1: No valid Key exchange context
    debug1: Next authentication method: gssapi-with-mic
    debug1: Unspecified GSS failure.  Minor code may provide more information
    Credentials cache file '/tmp/krb5cc_501' not found
    
    debug1: Unspecified GSS failure.  Minor code may provide more information
    Credentials cache file '/tmp/krb5cc_501' not found
    
    debug1: Unspecified GSS failure.  Minor code may provide more information
    
    
    debug1: Unspecified GSS failure.  Minor code may provide more information
    Credentials cache file '/tmp/krb5cc_501' not found
    
    debug1: Next authentication method: publickey
    debug1: Trying private key: /usr/local/user/.ssh/identity
    debug1: Trying private key: /usr/local/user/.ssh/id_rsa
    debug1: Trying private key: /usr/local/user/.ssh/id_dsa
    debug1: Next authentication method: password
    root@localhost's password:
    debug1: Authentication succeeded (password).
    debug1: channel 0: new [client-session]
    debug1: Requesting [email protected]
    debug1: Entering interactive session.
    

    At this point it gets stuck, for a long while again, and eventually logs in successfully.

    Any pointers will be appreciated. This is driving me up the walls.

    The following also appears in the dmesg output. It repeats for various process names (not only 'cifsd')

    cifsd: page allocation failure. order:5, mode:0x20
    Pid: 12913, comm: cifsd Not tainted 2.6.32-431.el6.x86_64 #1
    Call Trace:
     [<ffffffff8112f9e7>] ? __alloc_pages_nodemask+0x757/0x8d0
     [<ffffffff8116e482>] ? kmem_getpages+0x62/0x170
     [<ffffffff8116f09a>] ? fallback_alloc+0x1ba/0x270
     [<ffffffff8116eaef>] ? cache_grow+0x2cf/0x320
     [<ffffffff8116ee19>] ? ____cache_alloc_node+0x99/0x160
     [<ffffffff8116ffe0>] ? kmem_cache_alloc_node_trace+0x90/0x200
     [<ffffffff811701fd>] ? __kmalloc_node+0x4d/0x60
     [<ffffffff8144feca>] ? __alloc_skb+0x7a/0x180
     [<ffffffff81450fe0>] ? skb_copy+0x40/0xb0
     [<ffffffffa014f57c>] ? tg3_start_xmit+0xa8c/0xd80 [tg3]
     [<ffffffff81460354>] ? dev_hard_start_xmit+0x224/0x480
     [<ffffffff8147bd0a>] ? sch_direct_xmit+0x15a/0x1c0
     [<ffffffff81460858>] ? dev_queue_xmit+0x228/0x320
     [<ffffffff8149a0d8>] ? ip_finish_output+0x148/0x310
     [<ffffffff8149a358>] ? ip_output+0xb8/0xc0
     [<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
     [<ffffffff81499655>] ? ip_local_out+0x25/0x30
     [<ffffffff81499b30>] ? ip_queue_xmit+0x190/0x420
     [<ffffffff8112ff2f>] ? free_hot_page+0x2f/0x60
     [<ffffffff814aee3e>] ? tcp_transmit_skb+0x40e/0x7b0
     [<ffffffff814b1380>] ? tcp_write_xmit+0x230/0xa90
     [<ffffffff814b1f00>] ? __tcp_push_pending_frames+0x30/0xe0
     [<ffffffff814a9663>] ? tcp_data_snd_check+0x33/0x100
     [<ffffffff814ad261>] ? tcp_rcv_established+0x381/0x7f0
     [<ffffffff8152873a>] ? schedule_timeout+0x19a/0x2e0
     [<ffffffff814b5643>] ? tcp_v4_do_rcv+0x2e3/0x490
     [<ffffffff814a130a>] ? tcp_prequeue_process+0x7a/0xa0
     [<ffffffff814a4a2c>] ? tcp_recvmsg+0xacc/0xe80
     [<ffffffff814c58ca>] ? inet_recvmsg+0x5a/0x90
     [<ffffffff8105a625>] ? select_idle_sibling+0x95/0x150
     [<ffffffff81449ab3>] ? sock_recvmsg+0x133/0x160
     [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
     [<ffffffff81059216>] ? enqueue_task+0x66/0x80
     [<ffffffff8105571d>] ? check_preempt_curr+0x6d/0x90
     [<ffffffff81065c5e>] ? try_to_wake_up+0x24e/0x3e0
     [<ffffffff81065e02>] ? default_wake_function+0x12/0x20
     [<ffffffff8109b2b6>] ? autoremove_wake_function+0x16/0x40
     [<ffffffff81449b24>] ? kernel_recvmsg+0x44/0x60
     [<ffffffffa01fd7c9>] ? cifs_readv_from_socket+0x1a9/0x260 [cifs]
     [<ffffffffa020b11d>] ? cifs_add_credits+0x5d/0x70 [cifs]
     [<ffffffffa01fd8a7>] ? cifs_read_from_socket+0x27/0x30 [cifs]
     [<ffffffffa01fda03>] ? cifs_demultiplex_thread+0x153/0xe10 [cifs]
     [<ffffffff81065e02>] ? default_wake_function+0x12/0x20
     [<ffffffffa01fd8b0>] ? cifs_demultiplex_thread+0x0/0xe10 [cifs]
     [<ffffffff8109aef6>] ? kthread+0x96/0xa0
     [<ffffffff8100c20a>] ? child_rip+0xa/0x20
     [<ffffffff8109ae60>] ? kthread+0x0/0xa0
     [<ffffffff8100c200>] ? child_rip+0x0/0x20
    
    • Admin
      Admin almost 10 years
      The call trace looks like you have a memory issue. This can make the entire system slow, especially when referred to logins, that as the process(es) is not constantly alive.
    • Admin
      Admin almost 10 years
      Can you post your /etc/ssh/sshd_config file. I thinks you have to disable the kerberos authentication part. Try to comment out # Kerberos options and # GSSAPI options sections
    • Admin
      Admin almost 10 years
      Is this problem present after a fresh system reboot? Is the problem present even if you use bash --login? Is your home directory in a network filesystem?
    • Admin
      Admin almost 10 years
      @pqnet No the logins are fast after a reboot. I cannot answer your second question, as I restarted the server and it is fast now. This is a standalone machine with all storage locally attached.
  • techedemic
    techedemic almost 10 years
    Thanks. We found that to be the problem, but I am curious why a machine with significant specs would run out of resources so quickly. The machine is not swapping (much). I read somewhere that modifying the following at runtime might fix the problem (or obviously permanently in the sysctl file): /proc/sys/vm/min_free_kbytes = 512000 /proc/sys/vm/zone_reclaim_mode = 512000 I haven't tried this but the moment the problem re-appears we will have a look at it.
  • Valath
    Valath over 9 years
    Thank you! this has been vexing me for quite some time now.
  • petertc
    petertc about 7 years
    In this way I found the problem is __sdkman_secure_curl_with_timeouts "$VERSION_URL" in my machine that slow the login procedure.
  • Jeff Schaller
    Jeff Schaller about 5 years
    It's be great if you could connect SELinux in any way to the symptoms in the question.