Chef-client dying mid run

11,507

If your ssh connection gets interrupted to a machine while chef-client is running, does that crash the run in seemingly inexplicable ways?

Well, the stacktrace seems to imply that something like that is happening. The message says "Errno::EIO: Input/output error - <STDOUT>" which is consistent with what I'd expect to see if STDOUT was going over an SSH channel that had been closed.

I suggest 2 things:

  • Run chef-client with all console output redirected to a file; e.g. add > /tmp/log 2>&1 to the end of the command. (The redirection needs to happen on the remote machine.)

  • Add -l debug to the command to increase the level of logging, as covered in Opscode's technical FAQ. This could reveal clues that are currently being hidden.


Looking at your second update, this has the hallmarks of some kind of firewall or network related problem.

Share:
11,507
Patrick M
Author by

Patrick M

Negative space.

Updated on June 28, 2022

Comments

  • Patrick M
    Patrick M almost 2 years

    We recently started having trouble with chef-client dying in the middle of a run after taking a lot more time stuck on various parts of the run-list that normally proceeded much quicker. I've been on my home wifi and my colleague has been on the work wifi, which has been having some connectivity problems of its own.

    If your ssh connection gets interrupted to a machine while chef-client is running, does that crash the run in seemingly inexplicable ways? I am using PutTY to connect from my Win7 and my colleague is using the Apple Terminal App.

    All the machines we've been running this on are Ubuntu 12.04 (in EC2) and have plenty of disk space left over - they're only utilizing ~1GB with ~5GB free.

    Here is the output of the log from /var/log/chef/client.log (set with the log_location directive in /etc/chef/client.rb as described here).

    [2014-01-08T00:27:07+00:00] WARN: Nodejs user is nodejs
    [2014-01-08T00:27:07+00:00] WARN: Cloning resource attributes for group[nodejs] from prior resource (CHEF-3694)
    [2014-01-08T00:27:07+00:00] WARN: Previous group[nodejs]: /var/chef/cache/cookbooks/nodejs/recipes/default.rb:26:in `from_file'
    [2014-01-08T00:27:07+00:00] WARN: Current  group[nodejs]: /var/chef/cache/cookbooks/spicoli-app/recipes/default.rb:38:in `from_file'
    [2014-01-08T00:27:07+00:00] WARN: Cloning resource attributes for user[nodejs] from prior resource (CHEF-3694)
    [2014-01-08T00:27:07+00:00] WARN: Previous user[nodejs]: /var/chef/cache/cookbooks/nodejs/recipes/default.rb:34:in `from_file'
    [2014-01-08T00:27:07+00:00] WARN: Current  user[nodejs]: /var/chef/cache/cookbooks/spicoli-app/recipes/default.rb:46:in `from_file'
    [2014-01-08T00:27:30+00:00] WARN: Environment is _default
    [2014-01-08T00:27:30+00:00] WARN: Nodejs user is nodejs
    [2014-01-08T02:04:54+00:00] ERROR: Running exception handlers
    [2014-01-08T02:04:54+00:00] ERROR: Exception handlers complete
    [2014-01-08T02:04:54+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
    [2014-01-08T02:04:55+00:00] ERROR: Input/output error - <STDOUT>
    [2014-01-08T02:04:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
    

    And the error stacktrace just has this:

    Generated at 2014-01-08 02:04:54 +0000
    Errno::EIO: Input/output error - <STDOUT>
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:91:in `write'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:91:in `puts'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:91:in `puts'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:61:in `display_section'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:44:in `block (2 levels) in display'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:43:in `each'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:43:in `block in display'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:42:in `each'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:42:in `display'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:130:in `display_error'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:161:in `resource_failed'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/doc.rb:159:in `resource_failed'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/event_dispatch/dispatcher.rb:29:in `block in resource_failed'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/event_dispatch/dispatcher.rb:29:in `each'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/event_dispatch/dispatcher.rb:29:in `resource_failed'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource.rb:637:in `rescue in run_action'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource.rb:643:in `run_action'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:49:in `run_action'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:81:in `block (2 levels) in converge'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:81:in `each'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:81:in `block in converge'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection.rb:98:in `block in execute_each_resource'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:116:in `call'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:116:in `call_iterator_block'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:104:in `iterate'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection.rb:96:in `execute_each_resource'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:80:in `converge'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:433:in `converge'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:500:in `do_run'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:199:in `block in run'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:193:in `fork'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:193:in `run'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application.rb:208:in `run_chef_client'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application/client.rb:312:in `block in run_application'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application/client.rb:304:in `loop'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application/client.rb:304:in `run_application'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application.rb:66:in `run'
    /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/bin/chef-client:26:in `<top (required)>'
    /usr/bin/chef-client:23:in `load'
    /usr/bin/chef-client:23:in `<main>'
    

    Which is a really generic error! But it does seem to indicate an interruption to STDOUT output, which kind of makes sense with a client disconnection.

    Edit: As requested, here are the contents of the client.rb file (names obfuscated, naturally.)

    $ cat /etc/chef/client.rb
    log_level        :auto
    log_location     "/var/log/chef/client.log"
    chef_server_url  "https://api.opscode.com/organizations/myapp"
    validation_client_name "my-validator"
    node_name "my-app-node"
    

    Edit 2: Attempt using sudo su -s /bin/bash root -c "screen chef-client"

    Screen terminated while I was at lunch and recorded a timeout on the ShellOut command for npm install. This was after chef-client was sitting stuck on this operation for over an hour.

    [2014-01-09T16:39:07+00:00] WARN: Environment is _default
    [2014-01-09T16:39:07+00:00] WARN: Nodejs user is nodejs
    [2014-01-09T18:16:28+00:00] ERROR: Running exception handlers
    [2014-01-09T18:16:28+00:00] ERROR: Exception handlers complete
    [2014-01-09T18:16:28+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
    [2014-01-09T18:16:31+00:00] ERROR: execute[npm-install-app] (spicoli-app::default line 110) had an error: Mixlib::ShellOut::CommandTimeout: command timed out:
    ---- Begin output of npm --registry http://my.npm.repo.amazonaws.com:5984/registry/_design/app/_rewrite install --cache /home/nodejs/.npm --tmp /home/nodejs/tmp
    
    --- snip: install messages from npm ---
    
    [2014-01-09T18:16:33+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
    

    This is a totally different error than before. The stacktrace.out file also explicitly mentions ShellOut, so it is entirely different as well. Most oddly, when I run the same npm command from the command line, in finishes in under a minute.

    So I'm not sure there is a way to further diagnose the previous failure, but I would welcome other suggestions. For input on this new failure, I asked this followup question.