SSH Connections freezing with "Write failed: Broken pipe"
Solution 1
It looks like the CentOS box's SSHD config is not set to do the client KeepAlive.
Drop these two lines in your CentOS sshd config (/etc/ssh/sshd_config), restart it, and enjoy!
KeepAlive yes
ClientAliveInterval 60
While you're at it, I'd recommend using gnu screen
to keep your session alive on the CentOS side.
Solution 2
The actual answer is almost always that you have a NAT device of some sort in the path, usually a firewall, whose state tables have a fairly aggressive timeout. Because you leave your ssh connection idle for some periods of time, the NAT device "forgets" the mapping between your inside address and source port number, and your ephemeral outside NATted address and port number.
When you later try to do something in that ssh window, a new ephemeral address/port pair is assigned to you, which the destination ssh server has no knowledge of, and doesn't respond to; later, some local timeout is reached, and the connection is dropped by your local machine.
The practical fix for this is to do exactly what yuriismaster suggests: enable KeepAlives (which ensure regular traffic to "tickle" that state table entry), and use screen
on the remote side (to preserve state in the event things do get dropped). I only post this answer because you asked what's happening, as well as what to do about it. Hopefully this clarifies why yuriismaster's suggestions are good ones.
Related videos on Youtube
Stephen RC
Senior developer at Defiant / Wordfence, security analyst, Tolkien fan, and general geek.
Updated on September 18, 2022Comments
-
Stephen RC over 1 year
I am connecting to a CentOS 5.5 box via SSH from a Ubuntu 11.04 machine.
The connection appears to work as expected when it is in active use (i.e. no lag or loss), but if it is left inactive for a while it will freeze up and become unresponsive. Eventually the error message "Write failed: Broken pipe" will be returned and I'll be back on my local machine's prompt.
What sort of things can I do to help debug this, find out what is happening, and get this resolved? Being a developer, this is making my life a pain having to reconnect constantly.
-
Stephen RC about 13 yearsThat makes perfect sense! We do have a NAT with DMZ setup for this box. I'll give the timeout configuration a try and see if that works for me. Thanks :)
-
Stephen RC about 13 yearsI'm accepting yours as you helped me understand the reasons behind the problem. But credit needs to go to @yuriismaster for the fix.
-
MadHatter about 13 yearsValorin: absolutely, it does, and he was first. Frankly, I think he deserves the accept more than me; but it's your question, so it should go as you see fit. Thanks for the feedback, either way.
-
ypid almost 9 yearsKeepAlive as been renamed to TCPKeepAlive and can be left at the default value which is yes. ClientAliveInterval should be sufficient. See
man sshd_config
.