Server becomes unreachable and comes back up on its own (most likely a network issue)

9,129

Solution 1

This kind of problem usually doesn't generate a lot of log messages. You have discovered the important two messages which show the interfaces going down and up. These can be generated by unplugging the ethernet cable and plugging it back in.

It could be a bad cable between the NIC and the router. My first steps (done one at a time) would be:

  • Replace the cable connected to eth0 and see if that resolves the problem.
  • Reconfigure the network interfaces so the traffic currently on eth0 is on eth1 and vise versa. (Requires a network restart and cable swap.) If the problem moves, then it is like a failing NIC.
  • Verify the status of the upstream device and its power supply. If it looses power or is otherwise failing you can see this kind of behavior.
  • Run netstat -i or ifconfig and examine the error counts. Normally, they should be 0 or single digits. High carrier or frame errors may indicate duplex mismatch. Duplex mismatch can be verified by uploading then downloading a large file. Large speed differences accompanied by increasing error counts indicate mismatch on the link. Cable modems usually have different upload and download bandwidths, so local transfers work better for this test.

One tool I do use is mtr. I use a command like mtr -i 15 -n google.com to monitor connectivity. Consider using one of your ISP's servers instead of google.com. It can be run in report mode in batch. If the problem is upstream of the server, the output should help identify where the problem is occurring.

Solution 2

BillThor has some great suggestions. If none of his solutions resolve the issue, auto-negotiation could be to blame (though unlikely). Try forcing the speed and duplex of the connection (instructions for RedHat, but other distros are similar)

Edit /etc/sysconfig/network-scripts/ifcfg-eth0:

ETHTOOL_OPTS="speed 100 duplex full autoneg off"

Then restart the interface:

/etc/init.d/network restart

Share:
9,129

Related videos on Youtube

Siddhant
Author by

Siddhant

I'm an experienced software developer located in Munich, Germany. I've been writing Python code close to 10 years now. I've had the chance to use it at almost all the companies where I've worked so far, and also currently maintain a few open source projects written in the language. The technologies I'm currently focusing on include Python, Tornado, Django, Vue.js, Terraform, SaltStack, AWS, and a few others in the Backend/DevOps space.

Updated on September 18, 2022

Comments

  • Siddhant
    Siddhant over 1 year

    I'm having a strange problem with a server I have sitting at my workplace (it's behind a NAT, if that's important). The issue is that at some times, it becomes unreachable and then comes back up again, usually within a few seconds, sometimes lasting up to 1 minute. It doesn't reboot, it doesn't crash. It simply becomes inaccessible. During this time, I cannot ssh into it, nor can I access any applications running on the machine (it's running a couple of Rails apps, so they become unreachable as well). I checked dmesg and saw these lines -

    [    4.958074] ADDRCONF(NETDEV_UP): eth0: link is not ready
    [    5.040476] ADDRCONF(NETDEV_UP): eth1: link is not ready
    [    5.175624] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    [    5.177207] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
    

    A couple of lines later, I see something similar concerning the network interfaces -

    [1195777.544167] igb: eth0 NIC Link is Down
    [1195780.962943] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    

    It does look like a network issue. /var/log/messages doesn't show anything interesting. I'm not sure how to debug this. Any clue as to what it could be? And what all things should I be checking here? Thanks!

  • Jim G.
    Jim G. over 11 years
    Thanks for the tip on mtr, another gem for the toolbox. Awesome tool!
  • Siddhant
    Siddhant over 11 years
    Error counts in netstat -i and ifconfig are both 0, so that's ruled out. mtr doesn't show anything suspicious either. I'll try to replace the cable and then reconfigure the network interfaces to see if that's the problem. Thanks for the suggestions!