All connections from this network get stuck in SYN_RECV state, connections from my home or phone properly get ESTABLISHED

30,572

Solution 1

To this jaded eye, it looks like there is some kind of routing issue close to the server in question. Packets come in along one path, but seem to depart through a different path and something stateful is on that path and dropping the weird "ACK without a SYN" packets.

I had this happen to me once. What ended up being the case was that the server had a bad network mask, so when traffic from off the subnet came in, it would issue an ARP request to get the MAC address of the node. Unfortunately for me, both the router and our load-balancer were enabled for Proxy-ARP, and the load-balancer was a bit faster on the trigger than the router. So the SYN packets came in via the router, but were attempting to leave the subnet via the load-balancer. As the LB didn't have a connection for that ACk packet, it dropped it on the floor.

In your case some judicious trace-routes may illuminate the network-path issues. From the affected server, attempt to traceroute out to the IPs that cause the problem, and do the same from those same IPs. If you're getting different paths, that may be where it is.

Solution 2

I've been in a similar situation using Javalin as a server.

I was not using any firewalls, netstat was showing the local device trying to access local server with SYN RECV status, ping not reaching local server.

I had set the IPv4 manually in local machine, undoing that and setting ipv4 method to automatic (DHCP) again did the trick. After that I was able to access local server from other local devices.

Share:
30,572

Related videos on Youtube

Tanrıverdi
Author by

Tanrıverdi

I lead technology projects at Department NYC

Updated on September 18, 2022

Comments

  • Tanrıverdi
    Tanrıverdi over 1 year

    My server (a linode VPS) suddenly started to timeout on every request yesterday.

    I'm pretty inexperienced in networking and would love to learn a process for debugging these connectivity issues.

    What confuses me is that yesterday, some people (my phone, me at home, friends at home) could consistently access the site and I see with netstat that a connection has been established. I disabled firwalls and set iptables to accept all connections to rule out any strange auto rules blacklisting our IP. I'm not sure if its relevant but a traceroute from the local network times out - traceroute from some machines outside find my server.

    I've confirmed various settings are correct by comparing to the settings on my development server which is functioning properly.

    The following files match my dev environment (except for their respective ip addresses):

    /etc/hosts 
    /etc/hosts.allow
    /etc/hosts.deny
    /etc/networking/interfaces 
    ifconfig
    

    Apache is listening on port 80 and the setup looks exactly the same as my functioning server.

    # server that doesn't work:
    tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      22008/apache2
    tcp        0      0 69.164.201.172:80       71.56.137.10:57487      SYN_RECV    -
    
    # server that does work
    tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      3334/apache2
    tcp        0      0 72.14.189.46:80         71.56.137.10:57490      ESTABLISHED 20931/apache2
    

    My attempt at understanding

    Every time I load the page once, netstat -an | grep :80 reveals all connections in SYN_RECV state.

    tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN
    tcp        0      0 69.164.201.172:80       71.56.137.10:56657      SYN_RECV
    tcp        0      0 69.164.201.172:80       71.56.137.10:56669      SYN_RECV
    tcp        0      0 69.164.201.172:80       71.56.137.10:56671      SYN_RECV
    

    So the SYN_RECV means the server is waiting for an ACK to be sent back from the client.
    How do I debug whether an ACK is being sent back? How do I debug where this communication is failing?

    Here's what a tcpdump looks like when I attempt to load the page once.

    In the paste below, my server is constantly sending packets to the client and not getting a response.

    What does this mean? That the client isn't getting the response? Or perhaps I'm swallowing the response somewhere in the server? How do I know to narrow down the culprit further?

    tcpdump -i eth0 -n -tttt port 80
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
    2011-05-25 20:12:54.627417 IP 71.56.137.10.57160 > 69.164.201.172.80: Flags [S], seq 382527960, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
    2011-05-25 20:12:54.627512 IP 69.164.201.172.80 > 71.56.137.10.57160: Flags [S.], seq 1330600505, ack 382527961, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    2011-05-25 20:12:54.814463 IP 69.164.201.172.80 > 71.56.137.10.57157: Flags [S.], seq 604630211, ack 496040070, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    2011-05-25 20:12:55.214482 IP 69.164.201.172.80 > 71.56.137.10.57158: Flags [S.], seq 998358186, ack 2224730755, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    2011-05-25 20:12:57.624737 IP 71.56.137.10.57160 > 69.164.201.172.80: Flags [S], seq 382527960, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
    2011-05-25 20:12:57.624793 IP 69.164.201.172.80 > 71.56.137.10.57160: Flags [S.], seq 1330600505, ack 382527961, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    2011-05-25 20:12:59.014477 IP 69.164.201.172.80 > 71.56.137.10.57160: Flags [S.], seq 1330600505, ack 382527961, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    2011-05-25 20:13:03.618790 IP 71.56.137.10.57160 > 69.164.201.172.80: Flags [S], seq 382527960, win 8192, options [mss 1460,nop,nop,sackOK], length 0
    2011-05-25 20:13:03.618866 IP 69.164.201.172.80 > 71.56.137.10.57160: Flags [S.], seq 1330600505, ack 382527961, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    2011-05-25 20:13:05.014514 IP 69.164.201.172.80 > 71.56.137.10.57160: Flags [S.], seq 1330600505, ack 382527961, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    2011-05-25 20:13:17.014504 IP 69.164.201.172.80 > 71.56.137.10.57160: Flags [S.], seq 1330600505, ack 382527961, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
    

    tcpdump for functional server

    Upon looking at the tcpdump for my functional server, I do see back and fourth communication between the server and the client.

    00:00:00.000000 IP 71.56.137.10.57260 > 72.14.189.46.80: Flags [S], seq 34114118s [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
    00:00:00.000110 IP 72.14.189.46.80 > 71.56.137.10.57260: Flags [S.], seq 2454858 win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 5], length 0
    00:00:00.061827 IP 71.56.137.10.57260 > 72.14.189.46.80: Flags [.], ack 1, win 100:00:00.004292 IP 71.56.137.10.57260 > 72.14.189.46.80: Flags [P.], seq 1:597, ngth 596
    00:00:00.000074 IP 72.14.189.46.80 > 71.56.137.10.57260: Flags [.], ack 597, win00:00:00.493990 IP 72.14.189.46.80 > 71.56.137.10.57260: Flags [.], seq 1:2921, ngth 2920
    00:00:00.000024 IP 72.14.189.46.80 > 71.56.137.10.57260: Flags [P.], seq 2921:30, length 98
    00:00:00.065135 IP 71.56.137.10.57260 > 72.14.189.46.80: Flags [.], ack 3019, wi00:00:00.034766 IP 71.56.137.10.57260 > 72.14.189.46.80: Flags [P.], seq 597:12925, length 699
    00:00:00.000035 IP 72.14.189.46.80 > 71.56.137.10.57260: Flags [.], ack 1296, wi00:00:00.000457 IP 72.14.189.46.80 > 71.56.137.10.57260: Flags [P.], seq 3019:328, length 211
    00:00:00.019196 IP 71.56.137.10.57262 > 72.14.189.46.80: Flags [S], seq 10674886s [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
    

    Any suggestions, explanations, or comments would be hugely appreciated so that I can understand TCP a little more and hopefully be a little more useful next time I need to debug a problem like this.

    Thank you!

  • Tanrıverdi
    Tanrıverdi almost 13 years
    Hey sysadmin - thanks for the response! I really appreciate it. I had read your other post about the issue :) Actually, the problem just solved itself without further input from me which means it was out of my control. I almost wish it hadn't -- I'd like to feel less helpless in the future to determine that the problem lies outside my domain. Getting my foot into this world has been tough!
  • Janis Veinbergs
    Janis Veinbergs almost 11 years
    This just happened to me and this answer was eye-opening. Turns out, my gateway server and central switch used the same IP address for specifid VLAN on which communication occured (misconfiguration on reconfiguration).
  • Philip
    Philip almost 10 years
    Sorry, but that's definitely a different problem from what the OP describes.