Spring Boot random "SSLException: Connection reset" in Kubernetes with JDK11

10,558

Solution 1

We had a similar problem when migrating to AWS/Kubernetes. I've found out why.

You're using a connection pool. The default behavior of the PoolingHttpClientConnectionManager is that it will reuse connections. So connections will not be closed immediately when your request is done. This will save resources by not having to reconnect all the time.

A Kubernetes cluster uses a NAT (Network Address Translation) for outgoing connections. When a connection is not used for a certain amount of time, the connection will be removed from the NAT-table, and the connection will be broken. This causes the seemingly random SSLExceptions.

On AWS, connections will be removed from the NAT-table when it is Idle for 350 seconds. Other Kubernetes instances might have other settings.

See https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-troubleshooting.html

The solution:

Disable connection-reuse:

final CloseableHttpClient closeableHttpClient = HttpClients.custom()
    .setConnectionReuseStrategy(NoConnectionReuseStrategy.INSTANCE)
    .setConnectionManager(poolingHttpClientConnectionManager)
    .build();

Or, let the httpClient evict connections that are idle for too long:

return HttpClients.custom()
            .evictIdleConnections(300, TimeUnit.SECONDS)  //Read the javadocs, may not be used when the instance of HttpClient is created inside an EJB container.
            .setConnectionManager(poolingHttpClientConnectionManager)
            .build();
        

Or call setConnectionKeepAliveStrategy(....) with a custom KeepAliveStrategy that will never return -1 or a timeout with a value of more than 300 seconds .

Solution 2

I will share my experience on this error probably it is the same problem you are facing. Comparing the stack trace which I had.

As this is happening randomly is the key phrase which I suspect that this is the same problem.

HTTP connections are made through an HTTP client library(Apache HTTP Client).

HTTP client usually manages, a re-usable pool of connections. This pool has a limit. In our case, the pool of connections is sometimes(Randomly) getting totally occupied. There are no more free connections which can be used anymore.

  1. You can either increase the pool size
  2. Implement a back-off retry mechanism which will try to grab a connection from the pool of HTTP connections when there is a failure on executing the HTTP request successfully.

If you wonder how to tune this underlying HTTP Client that is being used in sprint boot, check out this post.

Share:
10,558
Urosh T.
Author by

Urosh T.

Updated on September 15, 2022

Comments

  • Urosh T.
    Urosh T. almost 2 years

    Context:

    • We have a Spring Boot (2.3.1.RELEASE) web app
    • It's written in Java 8 but running inside of a container with Java 11 (openjdk:11.0.6-jre-stretch).
    • It has a DB connection and an upstream service that is called via https (simple RestTemplate#exchange method) (this is important!)
    • It is deployed inside of a Kubernetes cluster (not sure if this is important)

    Problem:

    • Every day, I see a small percentage of requests towards the upstream service fail with this error: I/O error on GET request for "https://upstream.xyz/path": Connection reset; nested exception is javax.net.ssl.SSLException: Connection reset
    • The errors are totally random and happen intermittently.
    • We have had a similar error (javax.net.ssl.SSLProtocolException: Connection reset) that was related to JRE11 and it's TLS 1.3 negotiation issue. We have updated our Docker image to above mentioned and that fixed it.
    • This is the stack trace from the error:
    java.net.SocketException: Connection reset
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketInputRecord.read(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source)
        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
        at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
        at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:739)
        at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:674)
        at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:583)
    ....
    

    Configuration:

    public static RestTemplate create(final int maxTotal, final int defaultMaxPerRoute,
                                      final int connectTimeout, final int readTimeout,
                                      final String userAgent) {
        final Registry<ConnectionSocketFactory> schemeRegistry = RegistryBuilder.<ConnectionSocketFactory>create()
                .register("http", PlainConnectionSocketFactory.getSocketFactory())
                .register("https", SSLConnectionSocketFactory.getSocketFactory())
                .build();
    
        final PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(schemeRegistry);
        connManager.setMaxTotal(maxTotal);
        connManager.setDefaultMaxPerRoute(defaultMaxPerRoute);
    
        final CloseableHttpClient httpClient = HttpClients.custom()
                .setConnectionManager(connManager)
                .setUserAgent(userAgent)
                .setDefaultRequestConfig(RequestConfig.custom()
                                                 .setConnectTimeout(connectTimeout)
                                                 .setSocketTimeout(readTimeout)
                                                 .setExpectContinueEnabled(false).build())
                .build();
    
        return new RestTemplateBuilder()
                .requestFactory(() -> new HttpComponentsClientHttpRequestFactory(httpClient))
                .build();
    }
    

    Has anyone experienced this issue? When I turn on debug logs on the http client, it is overflowing with noise and I am unable to discern anything useful...

  • Urosh T.
    Urosh T. almost 3 years
    Have you noticed that this new connection config impacts performance in any way?
  • vancoeverden
    vancoeverden almost 3 years
    We are not in production yet. Disabling connection-reuse will have an impact if you do a lot of requests. But I think the second en third option will not have an significant impact, because when your application does a lot of calls, your connections will never be idle, so this change does not change anything. When your application does not a lot of calls, it will now have to recreate a new connection every 5 minutes (worst case). That is not that much.