Ansible Fails to Authenticate Sudo Even When Sudo Pass is Given

13,064

Solution 1

Using @lulian as a foothold in this answer, the problem came down to a rogue ansible_sudo_pass: defined in the group_vars which was overriding the password entered for --ask-sudo-pass.

Using the following:

while [[ -z $(ps -eaf|grep 'sshd: [U]ser1@pts/1') ]]; do
    continue
done
strace -ff -vfp $(ps -eaf|grep 'sshd: [U]ser1@pts/1'|awk '{print $2}') -o /root/strace_sshd1_2.out

I was able to find that write(4, "{{ password }}\n", 15) was being passed instead of the entered password. After some quick searching, I did indeed find ansible_sudo_pass defined in my group_vars which was overriding my entered password.

As an FYI to everyone else, the ansible_sudo_pass: definition seems to take precedence over --ask-sudo-pass which, at first, seemed counter-intuitive. In the end, this is user error, but @lulian's methodology in debugging the SSH interaction as well as the relationship discovery between ansible_sudo_pass and --ask-sudo-pass should be very helpful for others out there. (Hopefully!)

Solution 2

What I would do is to use

strace -vfp `pidof sshd`

and see where it's failing.

Check the account as well, maybe it's restricted or something but my bet is that something is wrong with your /etc/hosts file or it does get changed in the process.

Share:
13,064

Related videos on Youtube

BrM13
Author by

BrM13

Updated on September 18, 2022

Comments

  • BrM13
    BrM13 over 1 year

    Problem

    Using the latest, stable Ansible build, I have an odd problem where my playbook hangs on one server during "Gathering_Facts" but works fine on other similar servers when using Sudo. On the Ansible server, I run as my user (NIS user) and use sudo (as root) on the remote server to make changes. If I remove Sudo from this setup, everything works fine.

    Setup

    Software Versions

    • OS: RHEL 6.4
    • Ansible version: ansible 1.8.2
    • Sudo version:
      Sudo version 1.8.6p3
      Sudoers policy plugin version 1.8.6p3
      Sudoers file grammar version 42
      Sudoers I/O plugin version 1.8.6p3
      
    • SSH version: OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010

    Server Map

                       -------- User1@Server1: sudo -H -S -p (Hangs on Gathering_Facts)
                      /
    User1@Ansible ----
                      \
                       -------- User1@Server2: sudo -H -S -p (Works fine)
    

    Users

    • User1: NIS accessible user on both Server1 & Server2.
    • root: local root user for each server.

    Ansible Configuration

    Relevant parts of my ansible.cfg.

    ansible.cfg

    sudo           = true
    sudo_user      = root
    ask_sudo_pass  = True
    ask_pass       = True
    ...
    gathering = smart
    ....
    # change this for alternative sudo implementations
    sudo_exe = sudo
    
    # what flags to pass to sudo
    #sudo_flags = -H
    ...
    # remote_user = ansible
    

    Here is a simple test playbook to touch an empty file and then remove it. Really, I'm just wanting to test if I can get Ansible to correctly use sudo on the remote server. If the playbook runs at all, I'm in good shape.

    TEST.yml

    ---
    - hosts: Server1:Server2
      vars:
      - test_file: '/tmp/ansible_test_file.txt'
      sudo: yes
      tasks:
      - name: create empty file to test connectivity and sudo access
        file: dest={{ test_file }}
              state=touch
              owner=root group=root mode=0600
        notify:
        - clean
      handlers:
      - name: clean
        file: dest={{ test_file }}
              state=absent
    

    Sudo Configuration

    /etc/sudoers

    Host_Alias SRV     = Server1, Server2
    User_Alias SUPPORT = User1, User2, User3
    SUPPORT SRV=(root) ALL
    

    This sudo configuration works just fine on BOTH servers. No problems with sudo itself.

    How I Run It All

    Very simple:

    $ ansible-playbook test.yml
    SSH password: 
    sudo password [defaults to SSH password]:
    
    PLAY [Server1:Server2] ********************************************** 
    
    GATHERING FACTS *************************************************************** 
    ok: [Server2]
    failed: [Server1] => {"failed": true, "parsed": false}
    
    Sorry, try again.
    [sudo via ansible, key=mxxiqyvztlfnbctwixzmgvhwfdarumtq] password: 
    sudo: 1 incorrect password attempt
    
    
    TASK: [create empty file to test connectivity and sudo access] **************** 
    changed: [Server2]
    
    NOTIFIED: [clean] ************************************************************* 
    changed: [Server2]
    
    PLAY RECAP ******************************************************************** 
               to retry, use: --limit @/home/User1/test.retry
    
    Server1                 : ok=0    changed=0    unreachable=0    failed=1   
    Server2                 : ok=3    changed=2    unreachable=0    failed=0
    

    Fails regardless of if I explicitly enter both the SSH/Sudo passwords as well as implicitly (letting sudo pass default to SSH).

    Remote Server Logs

    Server1 (Fails)

    /var/log/secure

    Dec 31 15:21:10 Server1 sshd[27093]: Accepted password for User1 from x.x.x.x port 51446 ssh2
    Dec 31 15:21:10 Server1 sshd[27093]: pam_unix(sshd:session): session opened for user User1 by (uid=0)
    Dec 31 15:21:11 Server1 sshd[27095]: subsystem request for sftp
    Dec 31 15:21:11 Server1 sudo: pam_unix(sudo:auth): authentication failure; logname=User1 uid=187 euid=0 tty=/dev/pts/1 ruser=User1 rhost=  user=User1 
    Dec 31 15:26:13 Server1 sudo: pam_unix(sudo:auth): conversation failed
    Dec 31 15:26:13 Server1 sudo: pam_unix(sudo:auth): auth could not identify password for [User1]
    Dec 31 15:26:13 Server1 sudo:  User1 : 1 incorrect password attempt ; TTY=pts/1 ; PWD=/home/User1 ; USER=root ; COMMAND=/bin/sh -c echo SUDO-SUCCESS-mxxiqyvztlfnbctwixzmgvhwfdarumtq; LANG=C LC_CTYPE=C /usr/bin/python /tmp/.ansible/tmp/ansible-tmp-1420039272.66-164754043073536/setup; rm -rf /tmp/.ansible/tmp/ansible-tmp-1420039272.66-164754043073536/ >/dev/null 2>&1
    Dec 31 15:26:13 Server1 sshd[27093]: pam_unix(sshd:session): session closed for user User1 
    

    Server2 (runs fine)

    /var/log/secure

    Dec 31 15:21:12 Server2 sshd[31447]: Accepted password for User1 from x.x.x.x port 60346 ssh2
    Dec 31 15:21:12 Server2 sshd[31447]: pam_unix(sshd:session): session opened for user User1 by (uid=0)
    Dec 31 15:21:12 Server2 sshd[31449]: subsystem request for sftp
    Dec 31 15:21:12 Server2 sudo:  User1 : TTY=pts/2 ; PWD=/home/User1 ; USER=root ; COMMAND=/bin/sh -c echo SUDO-SUCCESS-vjaypzeocvrdlqalxflgcrcoezhnbibs; LANG=C LC_CTYPE=C /usr/bin/python /tmp/.ansible/tmp/ansible-tmp-1420039272.68-243930711246149/setup; rm -rf /tmp/.ansible/tmp/ansible-tmp-1420039272.68-243930711246149/ >/dev/null 2>&1
    Dec 31 15:21:14 Server2 sshd[31447]: pam_unix(sshd:session): session closed for user User1 
    

    STrace Output

    Here is the output from strace when targeting the root user's ansible command. Command:

    while [[ -z $(ps -fu root|grep [a]nsible|awk '{print $2}') ]]; do
        continue
    done
    strace -vfp $(ps -fu root|grep [a]nsible|awk '{print $2}') -o /root/strace.out`
    

    Server1

    23650 select(0, NULL, NULL, NULL, {1, 508055}) = 0 (Timeout)
    23650 socket(PF_NETLINK, SOCK_RAW, 9)   = 10
    23650 fcntl(10, F_SETFD, FD_CLOEXEC)    = 0
    23650 readlink("/proc/self/exe", "/usr/bin/sudo", 4096) = 13
    23650 sendto(10, "|\0\0\0L\4\5\0\1\0\0\0\0\0\0\0op=PAM:authentic"..., 124, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 124
    23650 poll([{fd=10, events=POLLIN}], 1, 500) = 1 ([{fd=10, revents=POLLIN}])
    23650 recvfrom(10, "$\0\0\0\2\0\0\0\1\0\0\0b\\\0\0\0\0\0\0|\0\0\0L\4\5\0\1\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
    23650 recvfrom(10, "$\0\0\0\2\0\0\0\1\0\0\0b\\\0\0\0\0\0\0|\0\0\0L\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
    23650 close(10)                         = 0
    23650 write(2, "Sorry, try again.\n", 18) = 18
    23650 gettimeofday({1420050850, 238344}, NULL) = 0
    23650 socket(PF_FILE, SOCK_STREAM, 0)   = 10
    23650 connect(10, {sa_family=AF_FILE, path="/var/run/dbus/system_bus_socket"}, 33) = 0
    

    Server2

    6625  select(8, [5 7], [], NULL, NULL)  = ? ERESTARTNOHAND (To be restarted)
    6625  --- SIGCHLD (Child exited) @ 0 (0) ---
    6625  write(8, "\21", 1)                = 1
    6625  rt_sigreturn(0x8)                 = -1 EINTR (Interrupted system call)
    6625  select(8, [5 7], [], NULL, NULL)  = 1 (in [7])
    6625  read(7, "\21", 1)                 = 1
    6625  wait4(6636, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, NULL) = 6636
    6625  rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
    6625  socket(PF_NETLINK, SOCK_RAW, 9)   = 6
    6625  fcntl(6, F_SETFD, FD_CLOEXEC)     = 0
    6625  readlink("/proc/self/exe", "/usr/bin/sudo", 4096) = 13
    6625  sendto(6, "x\0\0\0R\4\5\0\6\0\0\0\0\0\0\0op=PAM:session_c"..., 120, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 120
    6625  poll([{fd=6, events=POLLIN}], 1, 500) = 1 ([{fd=6, revents=POLLIN}])
    6625  recvfrom(6, "$\0\0\0\2\0\0\0\6\0\0\0\330\355\377\377\0\0\0\0x\0\0\0R\4\5\0\6\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
    6625  recvfrom(6, "$\0\0\0\2\0\0\0\6\0\0\0\330\355\377\377\0\0\0\0x\0\0\0R\4\5\0\6\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
    6625  close(6)                          = 0
    6625  open("/etc/security/pam_env.conf", O_RDONLY) = 6
    6625  fstat(6, {st_dev=makedev(253, 1), st_ino=521434, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2980, st_atime=2014/12/31-16:10:01, st_mtime=2012/10/15-08:23:52, st_ctime=2014/06/16-15:45:35}) = 0
    6625  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbc3a59a000
    6625  read(6, "#\n# This is the configuration fi"..., 4096) = 2980
    6625  read(6, "", 4096)                 = 0
    6625  close(6)                          = 0
    6625  munmap(0x7fbc3a59a000, 4096)      = 0
    6625  open("/etc/environment", O_RDONLY) = 6
    

    My Guess

    Server1 isn't getting the password properly or is incorrectly asking/waiting for a password. This doesn't look like a Sudo or Ansible problem (alone, they both work just fine), but Server1 doesn't seem to receive the credentials (or adhere to them) in a similar fashion as Server2. Server1 & 2 serve different purposes, so it's possible that they have some authentication or package version differences, but they were both built from the same repository; therefore, they shouldn't be THAT different.

    PAM Auth

    I thought maybe the systems had different PAM configurations causing the passwords to be handled a bit differently. I compared the /etc/pam.d/ files (using md5sum [file]), and they are the same between the two systems.

    Tests

    Sudo STDIN

    Tested another issue where sudo wouldn't read a password from STDIN, but that worked fine on both servers.

    Test Sudo Ad-Hoc

    -bash-4.1$ ansible Server1 -m file -a "dest=/tmp/ansible_test.txt state=touch" -sK
    SSH password: 
    sudo password [defaults to SSH password]: 
    Server1 | success >> {
        "changed": true, 
        "dest": "/tmp/ansible_test.txt", 
        "gid": 0, 
        "group": "root", 
        "mode": "0644", 
        "owner": "root", 
        "size": 0, 
        "state": "file", 
        "uid": 0
    }
    

    Success! But why?!

    TL;DR

    1. Server1 seems to be waiting on sudo password prompt while Server2 runs just fine.
    2. Running ansible "ad-hoc" on Server1 works fine. Running it as a playbook fails.

    Question(s)

    • What could cause my Ansible Sudo configuration to work fine on one server and be rejected on another?
    • Does Ansible perform the password "pass" from local to remote machine differently when run ad-hoc versus playbook? I assumed they would be the same.

    I'm thinking this is getting close to simply submitting a bug report to the GitHub page purely on the fact that sudo access has different results depending if I'm running ad-hoc or not.

  • BrM13
    BrM13 over 9 years
    Thanks, lulian. I've applied a few edits to the question, one section being STrace output. It's clear that there's a difference between the two servers in how they proceed after the ansible process kicks off on the remote server. Subsequent runs and trace captures were consistent.
  • Iulian
    Iulian over 9 years
    I think you need more from that strace -vfp, do it manually on the top sshd process and follow through the output. I don't think after reading the password it's just closing the channel like that before going through PAM etc. At that point please take a look on the sshd_config file and hosts.deny .. see if you can find something there.
  • BrM13
    BrM13 over 9 years
    I had tried your suggestion before, but I must have missed the crucial element in there (hence why I opted to watch the ansible process in the initial STrace). After another go, I found an empty {{ password }} variable being passed instead of the real password. Opted to submit another "Answer" separately since your answer eventually got me on the right track.