Redis IOException: "Existing connection forcibly closed by remote host" using ServiceStack C# client

11,123

Solution 1

We think we found the root cause after carefully reading through the Redis documentation and finding this beauty (http://redis.io/topics/persistence):

RDB needs to fork() often in order to persist on disk using a child process.
Fork() can be time consuming if the dataset is big, and may result in Redis
to stop serving clients for some millisecond or even for one second if the
dataset is very big and the CPU performance not great. AOF also needs to fork()
but you can tune how often you want to rewrite your logs without any trade-off
on durability.

We turned RDB persistence off, and haven't seen those connection drops since.

Solution 2

It seems that setting the server timeout to 300 from 0 alleviated the connections failing en-masse issue. Still seeing some bad connections, but that might be because the PooledRedisClientManager doesn't properly check connection state for GetInActiveWriteClient () which is called from GetClient().

Share:
11,123

Related videos on Youtube

Bernardo
Author by

Bernardo

Updated on June 04, 2022

Comments

  • Bernardo
    Bernardo almost 2 years

    We have the following setup:

    Redis 2.6 on Ubuntu Linux 12.04LTE on a RackspaceCloud 8GB instance with the following settings:

    daemonize yes
    pidfile /var/run/redis_6379.pid
    
    port 6379
    
    timeout 300
    
    loglevel notice
    logfile /var/log/redis_6379.log
    
    databases 16
    
    save 900 1
    save 300 10
    save 60 10000
    
    rdbcompression yes
    dbfilename dump.rdb
    dir /var/redis/6379
    
    requirepass PASSWORD
    
    maxclients 10000
    
    maxmemory 7gb
    maxmemory-policy allkeys-lru
    maxmemory-samples 3
    
    appendonly no
    
    slowlog-log-slower-than 10000
    slowlog-max-len 128
    
    activerehashing yes
    

    Our App servers are hosted in RackSpace Managed and connect to the Redis via public IP (to avoid having to set up RackSpace Connect, which is a royal PITA), and we provide some security by requiring a password for the Redis connection. I manually increased unix file descriptor limits to 10240, max of 10k connections should offer enough headroom. As you can see from the settings file above, I limit memory usage to 7GB to leave some RAM headroom as well.

    We use the ServiceStack C# Redis Driver. We use the following web.config settings:

    <RedisConfig suffix="">
      <Primary password="PASSWORD" host="HOST" port="6379"  maxReadPoolSize="50" maxWritePoolSize="50"/>
    </RedisConfig>  
    

    We have a PooledRedisClientManager singleton, created once per AppPool as follows:

    private static PooledRedisClientManager _clientManager;
    public static PooledRedisClientManager ClientManager
    {
        get
        {
            if (_clientManager == null)
            {
                try
                {
                    var poolConfig = new RedisClientManagerConfig
                    {
                        MaxReadPoolSize = RedisConfig.Config.Primary.MaxReadPoolSize,
                        MaxWritePoolSize = RedisConfig.Config.Primary.MaxWritePoolSize,
                    };
    
                    _clientManager = new PooledRedisClientManager(new List<string>() { RedisConfig.Config.Primary.ToHost() }, null, poolConfig);
                }
                catch (Exception e)
                {
                    log.Fatal("Could not spin up Redis", e);
                    CacheFailed = DateTime.Now;
                }
            }
            return _clientManager;
        }
    }
    

    And we acquire a connection and do put/get operations as follows:

        using (var client = ClientManager.GetClient())
        {
            client.Set<T>(region + key, value);
        }
    

    Code seems to mostly work. Given that we have ~20 AppPools and 50-100 read and 50-100 write clients we expect 2000-4000 connections to the Redis server at the most. However, we keep seeing the following exception in our error logs, usually a couple hundred of those bunched together, nothing for an hour, and over again, ad nauseum.

    System.IO.IOException: Unable to read data from the transport connection:
    An existing connection was forcibly closed by the remote host.
    ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at
    System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at
    System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
    --- End of inner exception stack trace
    - at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.IO.BufferedStream.ReadByte() at
    ServiceStack.Redis.RedisNativeClient.ReadLine() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 85 at
    ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at
    ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96
    

    We have experimented with a Redis Server Timeout of 0 (i.e. NO connection timeout), a timeout of 24 hours, and in between, without luck. Googling and Stackoverflowing has brought no real answers, everything seems to point to us doing the right thing with the code at least.

    Our feeling is that we get regular sustained network latency issues beetwen Rackspace Hosted and Rackspace Cloud, which cause a block of TCP connections to go stale. We could possibly solve that by implementing Client side connection timeouts, and the question would be whether we'd need server side timeouts as well. But that's just a feeling, and we're not 100% sure we're on the right track.

    Ideas?

    Edit: I occasionally see the following error as well:

    ServiceStack.Redis.RedisException: Unable to Connect: sPort: 65025 ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Send(IList`1 buffers, SocketFlags socketFlags) at ServiceStack.Redis.RedisNativeClient.FlushSendBuffer() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 273 at ServiceStack.Redis.RedisNativeClient.SendCommand(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 203 --- End of inner exception stack trace --- at ServiceStack.Redis.RedisNativeClient.CreateConnectionError() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 165 at ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96
    

    I imagine this is a direct result of having server-side connection timeouts that aren't handled on the client. It's looking like we really need to be handling client-side connection timeouts.

    • Jarrod
      Jarrod over 11 years
      Did you ever find a good way to handle this? I see the same problem running an app in Azure with Redis on a separate VM. I think the cloud load balancer is killing the idle connections causing the above error for me.
    • Bernardo
      Bernardo over 11 years
      Can't really use a load balancer with Redis as it is, we're not using one. We haven't really solved this yet - we're seeing less errors now that we have dropped the server-side connection timeout to 300s, but we still see them occasionally and have no solution yet.
    • roryf
      roryf about 11 years
      I'm also seeing this error with a redis instance running on the same box (using MSOpenTech build). I'm using BasicClientManager but our traffic is much less (single AppPool with low visitor numbers) so wouldn't expect more than a handful of concurrent connections. Did you get any further investigating?
    • jeffgabhart
      jeffgabhart over 10 years
      Curious if anyone has any new information?
    • joeriks
      joeriks over 10 years
      I just made my code retry (max 5 times) to handle this error, feels dirty.
    • Steven
      Steven over 9 years
      I have a Redis client running in a Windows service which is set to Delayed Start, and it fails with the exact same issue. However, if I manually start it then everything works fine.
  • sonjz
    sonjz almost 10 years
    meaning 'save ""' and 'appendonly yes' in the .conf ?
  • sonjz
    sonjz almost 10 years
    for me, i see the issue on AWS, even with AOF=ON and RDB=OFF. the weird part is only a specific server will have the issue. if i have 6 servers in the pool, maybe 1 or 2 consistently have this issue, the rest of the servers are fine.
  • Steven
    Steven over 9 years
    The link you posted says that turning RDB off is discouraged, and that AOF and RDB may eventually be merged into a single persistence layer. So, this can't be a long-term fix, and changing the server timeout seems hacky. I'm surprised to not find more possible solutions on this issue.
  • Bernardo
    Bernardo over 9 years
    While I can't provide any more factual data, anectodally we have been running in high-traffic production environments with RDB turned off for close to a year without issues, and without ever seeing the above symptoms again. So far so good.