Redis IOException: "Existing connection forcibly closed by remote host" using ServiceStack C# client
Solution 1
We think we found the root cause after carefully reading through the Redis documentation and finding this beauty (http://redis.io/topics/persistence):
RDB needs to fork() often in order to persist on disk using a child process.
Fork() can be time consuming if the dataset is big, and may result in Redis
to stop serving clients for some millisecond or even for one second if the
dataset is very big and the CPU performance not great. AOF also needs to fork()
but you can tune how often you want to rewrite your logs without any trade-off
on durability.
We turned RDB persistence off, and haven't seen those connection drops since.
Solution 2
It seems that setting the server timeout to 300 from 0 alleviated the connections failing en-masse issue. Still seeing some bad connections, but that might be because the PooledRedisClientManager doesn't properly check connection state for GetInActiveWriteClient () which is called from GetClient().
Related videos on Youtube
Bernardo
Updated on June 04, 2022Comments
-
Bernardo almost 2 years
We have the following setup:
Redis 2.6 on Ubuntu Linux 12.04LTE on a RackspaceCloud 8GB instance with the following settings:
daemonize yes pidfile /var/run/redis_6379.pid port 6379 timeout 300 loglevel notice logfile /var/log/redis_6379.log databases 16 save 900 1 save 300 10 save 60 10000 rdbcompression yes dbfilename dump.rdb dir /var/redis/6379 requirepass PASSWORD maxclients 10000 maxmemory 7gb maxmemory-policy allkeys-lru maxmemory-samples 3 appendonly no slowlog-log-slower-than 10000 slowlog-max-len 128 activerehashing yes
Our App servers are hosted in RackSpace Managed and connect to the Redis via public IP (to avoid having to set up RackSpace Connect, which is a royal PITA), and we provide some security by requiring a password for the Redis connection. I manually increased unix file descriptor limits to 10240, max of 10k connections should offer enough headroom. As you can see from the settings file above, I limit memory usage to 7GB to leave some RAM headroom as well.
We use the ServiceStack C# Redis Driver. We use the following web.config settings:
<RedisConfig suffix=""> <Primary password="PASSWORD" host="HOST" port="6379" maxReadPoolSize="50" maxWritePoolSize="50"/> </RedisConfig>
We have a PooledRedisClientManager singleton, created once per AppPool as follows:
private static PooledRedisClientManager _clientManager; public static PooledRedisClientManager ClientManager { get { if (_clientManager == null) { try { var poolConfig = new RedisClientManagerConfig { MaxReadPoolSize = RedisConfig.Config.Primary.MaxReadPoolSize, MaxWritePoolSize = RedisConfig.Config.Primary.MaxWritePoolSize, }; _clientManager = new PooledRedisClientManager(new List<string>() { RedisConfig.Config.Primary.ToHost() }, null, poolConfig); } catch (Exception e) { log.Fatal("Could not spin up Redis", e); CacheFailed = DateTime.Now; } } return _clientManager; } }
And we acquire a connection and do put/get operations as follows:
using (var client = ClientManager.GetClient()) { client.Set<T>(region + key, value); }
Code seems to mostly work. Given that we have ~20 AppPools and 50-100 read and 50-100 write clients we expect 2000-4000 connections to the Redis server at the most. However, we keep seeing the following exception in our error logs, usually a couple hundred of those bunched together, nothing for an hour, and over again, ad nauseum.
System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace - at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.IO.BufferedStream.ReadByte() at ServiceStack.Redis.RedisNativeClient.ReadLine() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 85 at ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96
We have experimented with a Redis Server Timeout of 0 (i.e. NO connection timeout), a timeout of 24 hours, and in between, without luck. Googling and Stackoverflowing has brought no real answers, everything seems to point to us doing the right thing with the code at least.
Our feeling is that we get regular sustained network latency issues beetwen Rackspace Hosted and Rackspace Cloud, which cause a block of TCP connections to go stale. We could possibly solve that by implementing Client side connection timeouts, and the question would be whether we'd need server side timeouts as well. But that's just a feeling, and we're not 100% sure we're on the right track.
Ideas?
Edit: I occasionally see the following error as well:
ServiceStack.Redis.RedisException: Unable to Connect: sPort: 65025 ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Send(IList`1 buffers, SocketFlags socketFlags) at ServiceStack.Redis.RedisNativeClient.FlushSendBuffer() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 273 at ServiceStack.Redis.RedisNativeClient.SendCommand(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 203 --- End of inner exception stack trace --- at ServiceStack.Redis.RedisNativeClient.CreateConnectionError() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 165 at ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96
I imagine this is a direct result of having server-side connection timeouts that aren't handled on the client. It's looking like we really need to be handling client-side connection timeouts.
-
Jarrod over 11 yearsDid you ever find a good way to handle this? I see the same problem running an app in Azure with Redis on a separate VM. I think the cloud load balancer is killing the idle connections causing the above error for me.
-
Bernardo over 11 yearsCan't really use a load balancer with Redis as it is, we're not using one. We haven't really solved this yet - we're seeing less errors now that we have dropped the server-side connection timeout to 300s, but we still see them occasionally and have no solution yet.
-
roryf about 11 yearsI'm also seeing this error with a redis instance running on the same box (using MSOpenTech build). I'm using
BasicClientManager
but our traffic is much less (single AppPool with low visitor numbers) so wouldn't expect more than a handful of concurrent connections. Did you get any further investigating? -
jeffgabhart over 10 yearsCurious if anyone has any new information?
-
joeriks over 10 yearsI just made my code retry (max 5 times) to handle this error, feels dirty.
-
Steven over 9 yearsI have a Redis client running in a Windows service which is set to Delayed Start, and it fails with the exact same issue. However, if I manually start it then everything works fine.
-
-
sonjz almost 10 yearsmeaning 'save ""' and 'appendonly yes' in the .conf ?
-
sonjz almost 10 yearsfor me, i see the issue on AWS, even with AOF=ON and RDB=OFF. the weird part is only a specific server will have the issue. if i have 6 servers in the pool, maybe 1 or 2 consistently have this issue, the rest of the servers are fine.
-
Steven over 9 yearsThe link you posted says that turning RDB off is discouraged, and that AOF and RDB may eventually be merged into a single persistence layer. So, this can't be a long-term fix, and changing the server timeout seems hacky. I'm surprised to not find more possible solutions on this issue.
-
Bernardo over 9 yearsWhile I can't provide any more factual data, anectodally we have been running in high-traffic production environments with RDB turned off for close to a year without issues, and without ever seeing the above symptoms again. So far so good.