Redis IOException: "Existing connection forcibly closed by remote host" using ServiceStack C# client

c# .net iis redis servicestack

11,123

Solution 1

We think we found the root cause after carefully reading through the Redis documentation and finding this beauty (http://redis.io/topics/persistence):

RDB needs to fork() often in order to persist on disk using a child process.
Fork() can be time consuming if the dataset is big, and may result in Redis
to stop serving clients for some millisecond or even for one second if the
dataset is very big and the CPU performance not great. AOF also needs to fork()
but you can tune how often you want to rewrite your logs without any trade-off
on durability.

We turned RDB persistence off, and haven't seen those connection drops since.

Solution 2

It seems that setting the server timeout to 300 from 0 alleviated the connections failing en-masse issue. Still seeing some bad connections, but that might be because the PooledRedisClientManager doesn't properly check connection state for GetInActiveWriteClient () which is called from GetClient().

11,123

Bernardo

Updated on June 04, 2022

Comments

Bernardo almost 2 years

We have the following setup:

Redis 2.6 on Ubuntu Linux 12.04LTE on a RackspaceCloud 8GB instance with the following settings:

daemonize yes
pidfile /var/run/redis_6379.pid

port 6379

timeout 300

loglevel notice
logfile /var/log/redis_6379.log

databases 16

save 900 1
save 300 10
save 60 10000

rdbcompression yes
dbfilename dump.rdb
dir /var/redis/6379

requirepass PASSWORD

maxclients 10000

maxmemory 7gb
maxmemory-policy allkeys-lru
maxmemory-samples 3

appendonly no

slowlog-log-slower-than 10000
slowlog-max-len 128

activerehashing yes

Our App servers are hosted in RackSpace Managed and connect to the Redis via public IP (to avoid having to set up RackSpace Connect, which is a royal PITA), and we provide some security by requiring a password for the Redis connection. I manually increased unix file descriptor limits to 10240, max of 10k connections should offer enough headroom. As you can see from the settings file above, I limit memory usage to 7GB to leave some RAM headroom as well.

We use the ServiceStack C# Redis Driver. We use the following web.config settings:

<RedisConfig suffix="">
  <Primary password="PASSWORD" host="HOST" port="6379"  maxReadPoolSize="50" maxWritePoolSize="50"/>
</RedisConfig>

We have a PooledRedisClientManager singleton, created once per AppPool as follows:

private static PooledRedisClientManager _clientManager;
public static PooledRedisClientManager ClientManager
{
    get
    {
        if (_clientManager == null)
        {
            try
            {
                var poolConfig = new RedisClientManagerConfig
                {
                    MaxReadPoolSize = RedisConfig.Config.Primary.MaxReadPoolSize,
                    MaxWritePoolSize = RedisConfig.Config.Primary.MaxWritePoolSize,
                };

                _clientManager = new PooledRedisClientManager(new List<string>() { RedisConfig.Config.Primary.ToHost() }, null, poolConfig);
            }
            catch (Exception e)
            {
                log.Fatal("Could not spin up Redis", e);
                CacheFailed = DateTime.Now;
            }
        }
        return _clientManager;
    }
}

And we acquire a connection and do put/get operations as follows:

    using (var client = ClientManager.GetClient())
    {
        client.Set<T>(region + key, value);
    }

Code seems to mostly work. Given that we have ~20 AppPools and 50-100 read and 50-100 write clients we expect 2000-4000 connections to the Redis server at the most. However, we keep seeing the following exception in our error logs, usually a couple hundred of those bunched together, nothing for an hour, and over again, ad nauseum.

System.IO.IOException: Unable to read data from the transport connection:
An existing connection was forcibly closed by the remote host.
---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at
System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at
System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace
- at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.IO.BufferedStream.ReadByte() at
ServiceStack.Redis.RedisNativeClient.ReadLine() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 85 at
ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at
ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96

We have experimented with a Redis Server Timeout of 0 (i.e. NO connection timeout), a timeout of 24 hours, and in between, without luck. Googling and Stackoverflowing has brought no real answers, everything seems to point to us doing the right thing with the code at least.

Our feeling is that we get regular sustained network latency issues beetwen Rackspace Hosted and Rackspace Cloud, which cause a block of TCP connections to go stale. We could possibly solve that by implementing Client side connection timeouts, and the question would be whether we'd need server side timeouts as well. But that's just a feeling, and we're not 100% sure we're on the right track.

Ideas?

Edit: I occasionally see the following error as well:

ServiceStack.Redis.RedisException: Unable to Connect: sPort: 65025 ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Send(IList`1 buffers, SocketFlags socketFlags) at ServiceStack.Redis.RedisNativeClient.FlushSendBuffer() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 273 at ServiceStack.Redis.RedisNativeClient.SendCommand(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 203 --- End of inner exception stack trace --- at ServiceStack.Redis.RedisNativeClient.CreateConnectionError() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 165 at ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96

I imagine this is a direct result of having server-side connection timeouts that aren't handled on the client. It's looking like we really need to be handling client-side connection timeouts.

Jarrod over 11 years

Did you ever find a good way to handle this? I see the same problem running an app in Azure with Redis on a separate VM. I think the cloud load balancer is killing the idle connections causing the above error for me.
Bernardo over 11 years

Can't really use a load balancer with Redis as it is, we're not using one. We haven't really solved this yet - we're seeing less errors now that we have dropped the server-side connection timeout to 300s, but we still see them occasionally and have no solution yet.
roryf about 11 years

I'm also seeing this error with a redis instance running on the same box (using MSOpenTech build). I'm using BasicClientManager but our traffic is much less (single AppPool with low visitor numbers) so wouldn't expect more than a handful of concurrent connections. Did you get any further investigating?
jeffgabhart over 10 years

Curious if anyone has any new information?
joeriks over 10 years

I just made my code retry (max 5 times) to handle this error, feels dirty.
Steven over 9 years

I have a Redis client running in a Windows service which is set to Delayed Start, and it fails with the exact same issue. However, if I manually start it then everything works fine.

sonjz almost 10 years

meaning 'save ""' and 'appendonly yes' in the .conf ?
sonjz almost 10 years

for me, i see the issue on AWS, even with AOF=ON and RDB=OFF. the weird part is only a specific server will have the issue. if i have 6 servers in the pool, maybe 1 or 2 consistently have this issue, the rest of the servers are fine.
Steven over 9 years

The link you posted says that turning RDB off is discouraged, and that AOF and RDB may eventually be merged into a single persistence layer. So, this can't be a long-term fix, and changing the server timeout seems hacky. I'm surprised to not find more possible solutions on this issue.
Bernardo over 9 years

While I can't provide any more factual data, anectodally we have been running in high-traffic production environments with RDB turned off for close to a year without issues, and without ever seeing the above symptoms again. So far so good.