How can we troubleshoot intermittent "An existing connection was forcibly closed" errors caused by a Cisco CSS

16,999

Solution 1

The problem was the Cisco CSS. We determined this by pointing the tier 1 servers directly to the tier 2 servers and going 48 hours without observing the problem. Once we determined it was the CSS, we corrected this problem by adjusting the insanely low default value for this parameter:

"Default flow inactivity timeouts, in seconds, for the TCP or UDP port. If a flow is idle for the amount of time specified in the timeout value, the CSS tears down the flow and reclaims the flow resources."

We set this to 84 (which is 84 16-second increments). Since the default keep-alive for HTTP is 120 seconds, the default value was too low.

Solution 2

To check recycling of the Application pool go to your IIS and open the Properties of the Application Pool on which your remoting service is running. You can configure recycling of Application pools using a time interval, number of requests or define specific times.

You could remove the current recycling rules and set a recycling to a time where no connections are expected, like 3.00 at night. Then see if the exceptions stil occur.

Solution 3

It could be a network component causing this. The way to rule this out would be to place both machines (or test machines) on the same subnet, then run a load test, and verify that you do not get the same error.

The other things that could be causing it could be:

Share:
16,999
JohnOpincar
Author by

JohnOpincar

Updated on July 25, 2022

Comments

  • JohnOpincar
    JohnOpincar almost 2 years

    We have the "standard" three tier architecture with our middle tier hosted in IIS and accessed via .net remoting. These errors occur between our web and web services servers (front tier) that are remoting to the app servers (middle tier). We'll get this error 3-10 times a day out of ~130K total calls in the day.

    The exception and stack trace always look similar to this:


    Exception Type: System.Net.WebException
    Message: The underlying connection was closed: An unexpected error occurred on a receive.
    
    Server stack trace: 
       at System.Runtime.Remoting.Channels.Http.HttpClientTransportSink.ProcessResponseException(WebException webException, HttpWebResponse& response)
       at System.Runtime.Remoting.Channels.Http.HttpClientTransportSink.ProcessMessage(IMessage msg, ITransportHeaders requestHeaders, Stream requestStream, ITransportHeaders& responseHeaders, Stream& responseStream)
       at System.Runtime.Remoting.Channels.BinaryClientFormatterSink.SyncProcessMessage(IMessage msg)
    
    Exception rethrown at [0]: 
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at XXXXX.BusinessFacade.Interface.XXXXInterface.SubmitXXXX(
       at XXX.XXXXWebServicesLibrary.XXXXService.CreateXXXXXX.RunXXXXMethod()
       at XXX.XXXXWebServicesLibrary.XXXXService.XXXXXXMethod`2.RunMethod()
       at XXX.XXXXWebServicesLibrary.XXXXXWebMethod`2.Run()HandleReturnMessage()
    Inner Exception: 
    
    Exception Type: System.IO.IOException
    Message: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
       at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
       at System.Net.PooledStream.Read(Byte[] buffer, Int32 offset, Int32 size)
       at System.Net.Connection.SyncRead(HttpWebRequest request, Boolean userRetrievedStream, Boolean probeRead)Read()
    Inner Exception: 
    
    Exception Type: System.Net.Sockets.SocketException
    Message: An existing connection was forcibly closed by the remote host
       at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
       at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)Receive()
    

    There's no particular remoting call that causes this to happen, it can be any of them which seems to rule out any sort of application specific cause. The only common denominator is the "Exception Type: System.Net.Sockets.SocketException Message: An existing connection was forcibly closed by the remote host" portion of the error.

    The front and middle tiers are separated by a firewall and we are also utilizing a VIP device. I strongly suspect an issue with our network/firewall configuration but our network guys are just scratching their heads and not offering any suggestions.

    Although a 0.003% failure rate may seem insignificant, we have partners that scrutinize our communications very carefully and I am just waiting for this to become an issue they notice. I don't want to have to say "I don't know" when that time comes.

    Does anyone have any ideas on how I could provide more information or any suggestions I could make to our network guys to get this resolved?

  • JohnOpincar
    JohnOpincar about 13 years
    The default recycling rules are in place (1740 minutes). Based on the description there, I don't see how this would be the problem since "normal" recycling only occurs on idle worker processes and the connections aren't tied to the worker processes.
  • JohnOpincar
    JohnOpincar about 13 years
    Those are all good suggestions. Unfortunately, we have done load tests in our "test" environment with loads that far exceed our production volume without reproducing the issue. We aren't using WCF so the configuration options you mentioned aren't relevant. I've checked the message size in the IIS log when we've gotten this failure and it's not large at all. I will probably awared you the bounty tomorrow morning if no one else has answered just so those points don't go to waste. :)
  • Shiraz Bhaiji
    Shiraz Bhaiji about 13 years
    Which Firewall and VIP device are you using?
  • JohnOpincar
    JohnOpincar about 13 years
    Turns out that it was a problem with the Cisco CSS we had between our front and middle tiers to load balance. When we pointed each front tier server directly to a middle tier server, we no longer had this problem. I will post more info as it become available.
  • Ciarán Bruen
    Ciarán Bruen about 13 years
    Hi @JohnOpincar did you manage to resolve this issue? I'm having the same problem - get the same error message when we go through a load balancer, but the problem doesn't occur when we bypass the load balancer and go straight to a particular server
  • JohnOpincar
    JohnOpincar about 13 years
    @Ciaran Bruen we have not. We have just isolated the problem to the CSS.