How to set Timeout for an ssh command and also get back the result of remote commands

9,098

Solution 1

Use GNU Parallel to parallelize your collection:

parallel --slf rhel-nodes --tag --timeout 1000% --onall --retries 3 \
  "rpm -q {}; rpm --queryformat '%{installtime:date} %{name}\n' -q {}" \
  ::: bash bc perl

Put the nodes in ~/.parallel/rhel-nodes.

--tag will prepend the output with the name of the node. --timeout 1000% says that if a command takes 10 times longer than the median to run, it will be killed. --onall will run all commands on all servers. --retries 3 will run a command up to 3 times if it fails. ::: bash bc perl are the packages you want to test for. If you have many packages, use the cat packages | parallel ... syntax instead of the parallel ... ::: packages.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Solution 2

Thank you for your attempts to solve this.

I have resolved the issue now, and the resolution needed was so simple, that I feel quite stupid now.

In the code I had used previously,

(ssh -q -o Batchmode=yes -o PasswordAuthentication=no -o ConnectTimeout=1 $i "rpm --queryformat '%{installtime:date} %{name}\n' -q \"kexec-tools\" | cut -d \" \" -f1,2,3,4|tr \" \" \"_\"" > /dev/null) & pid=$!

All I needed to do was to redirect all output to a file on the local machine. Thats it.

(ssh -q -o Batchmode=yes -o PasswordAuthentication=no -o ConnectTimeout=1 $i "rpm --queryformat '%{installtime:date} %{name}\n' -q \"kexec-tools\" | cut -d \" \" -f1,2,3,4|tr \" \" \"_\"" > **test**) & pid=$!

The only reason I decided to post an answer myself, was because the answer you gave required external tools/code to be brought in and used, which is something I don't want to do. Instead, I wanted a solution that can be applied without having any dependency on availability of packages or OS version.

Thank you nonetheless!

Share:
9,098

Related videos on Youtube

sidx4
Author by

sidx4

Just another Software Developer. I <3 Python I <3 JavaScript I

Updated on September 18, 2022

Comments

  • sidx4
    sidx4 almost 2 years

    I have a script that runs on bash, in an RHEL server that connects to thousands of nodes and returns with the values of some 5 executed commands - It collects info from RHEL Servers only.

    It works fine, but the problem is that some nodes end up freezing when I run the following commands:

    rpm -q <package-name>
    rpm --queryformat '%{installtime:date} %{name}\n' -q <package-name>
    

    Now, since this positively stops my script, I want to set a timeout for the ssh command, and exit the ssh session if it keeps waiting for some remote command to execute for too long [say 10 seconds]. I want to timeout and exit that ssh session and move on to the next node when this happens. How do i do this?

    Here's the part of the script where I currently pull out the information and store it in the variable called dump [Please ignore my poor scripting, Im new at this]

    dump=$(ssh -o ServerAliveCountMax=1 -o ServerAliveInterval=10 -o ConnectTimeout=10 -o BatchMode=yes $i "cat /proc/meminfo | grep -i \"memtotal\" | cut -d \":\" -f2 | tr -d \" \" | tr -d \"kB\"; cat /etc/redhat-release | cut -d \" \" -f7; dmidecode | grep -i \"prod\" | grep -vi \"desktop\"  | grep -iv \"id\" | cut -d \" \" -f3,4| tr \" \" \"_\" ; uptime | cut -d \" \" -f4,5 | tr \" \" \"_\" | tr -d \",\"; service kdump status 2>/dev/null | tr \" \" \"_\";");
    

    Is there anyway to time this out if it keeps going for too long?

    WHAT I ALREADY TRIED:

    (ssh -q -o Batchmode=yes -o PasswordAuthentication=no -o ConnectTimeout=1 $i "rpm --queryformat '%{installtime:date} %{name}\n' -q \"kexec-tools\" | cut -d \" \" -f1,2,3,4|tr \" \" \"_\"" > /dev/null) & pid=$!
    (sleep 10 && kill -HUP $pid ) 2>/dev/null & watcher=$!
    if wait $pid 2>/dev/null; then
        pkill -HUP -P $watcher
        wait $watcher
    else
        echo -e "$i Unable to ssh" >> res && continue
    fi
    

    However, this way, I am not being able to store the result of the remote rpm command.

    Any help is extremely appreciated.

  • sidx4
    sidx4 over 7 years
    While I know this is quite old, I had to come back and give your answer the credit it deserves :) While my solution worked for me, your's was a better one. Thank you.