How to prevent hangs on SocketInputStream.socketRead0 in Java?

46,833

Solution 1

For Apache HTTP Client (blocking) I found best solution is to getConnectionManager(). and shutdown it.

So in high-reliability solution I just schedule shutdown in other thread and in case request does not complete I'm shutting in down from other thread

Solution 2

Though this question mentions Windows, I have the same problem on Linux. It appears there is a flaw in the way the JVM implements blocking socket timeouts:

To summarize, timeout for blocking sockets is implemented by calling poll on Linux (and select on Windows) to determine that data is available before calling recv. However, at least on Linux, both methods can spuriously indicate that data is available when it is not, leading to recv blocking indefinitely.

From poll(2) man page BUGS section:

See the discussion of spurious readiness notifications under the BUGS section of select(2).

From select(2) man page BUGS section:

Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

The Apache HTTP Client code is a bit hard to follow, but it appears that connection expiration is only set for HTTP keep-alive connections (which you've disabled) and is indefinite unless the server specifies otherwise. Therefore, as pointed out by oleg, the Connection eviction policy approach won't work in your case and can't be relied upon in general.

Solution 3

As Clint said, you should consider a Non-blocking HTTP client, or (seeing that you are using the Apache Httpclient) implement a Multithreaded request execution to prevent possible hangs of the main application thread (this not solve the problem but is better than restart your app because is freezed). Anyway, you set the setStaleConnectionCheckEnabled property but the stale connection check is not 100% reliable, from the Apache Httpclient tutorial:

One of the major shortcomings of the classic blocking I/O model is that the network socket can react to I/O events only when blocked in an I/O operation. When a connection is released back to the manager, it can be kept alive however it is unable to monitor the status of the socket and react to any I/O events. If the connection gets closed on the server side, the client side connection is unable to detect the change in the connection state (and react appropriately by closing the socket on its end).

HttpClient tries to mitigate the problem by testing whether the connection is 'stale', that is no longer valid because it was closed on the server side, prior to using the connection for executing an HTTP request. The stale connection check is not 100% reliable and adds 10 to 30 ms overhead to each request execution.

The Apache HttpComponents crew recommends the implementation of a Connection eviction policy

The only feasible solution that does not involve a one thread per socket model for idle connections is a dedicated monitor thread used to evict connections that are considered expired due to a long period of inactivity. The monitor thread can periodically call ClientConnectionManager#closeExpiredConnections() method to close all expired connections and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections() method to close all connections that have been idle over a given period of time.

Take a look at the sample code of the Connection eviction policy section and try to implement it in your application along with the Multithread request execution, I think the implementation of both mechanisms will prevent your undesired hangs.

Solution 4

I have more than 50 machines that make about 200k requests/day/machine. They are running Amazon Linux AMI 2017.03. I previously had jdk1.8.0_102, now I have jdk1.8.0_131. I am using both apacheHttpClient and OKHttp as scraping libraries.

Each machine was running 50 threads, and sometimes, the threads get lost. After profiling with Youkit java profiler I got

ScraperThread42 State: RUNNABLE CPU usage on sample: 0ms
java.net.SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) SocketInputStream.java (native)
java.net.SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) SocketInputStream.java:116
java.net.SocketInputStream.read(byte[], int, int, int) SocketInputStream.java:171
java.net.SocketInputStream.read(byte[], int, int) SocketInputStream.java:141
okio.Okio$2.read(Buffer, long) Okio.java:139
okio.AsyncTimeout$2.read(Buffer, long) AsyncTimeout.java:211
okio.RealBufferedSource.indexOf(byte, long) RealBufferedSource.java:306
okio.RealBufferedSource.indexOf(byte) RealBufferedSource.java:300
okio.RealBufferedSource.readUtf8LineStrict() RealBufferedSource.java:196
okhttp3.internal.http1.Http1Codec.readResponse() Http1Codec.java:191
okhttp3.internal.connection.RealConnection.createTunnel(int, int, Request, HttpUrl) RealConnection.java:303
okhttp3.internal.connection.RealConnection.buildTunneledConnection(int, int, int, ConnectionSpecSelector) RealConnection.java:156
okhttp3.internal.connection.RealConnection.connect(int, int, int, List, boolean) RealConnection.java:112
okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) StreamAllocation.java:193
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) StreamAllocation.java:129
okhttp3.internal.connection.StreamAllocation.newStream(OkHttpClient, boolean) StreamAllocation.java:98
okhttp3.internal.connection.ConnectInterceptor.intercept(Interceptor$Chain) ConnectInterceptor.java:42
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.internal.http.BridgeInterceptor.intercept(Interceptor$Chain) BridgeInterceptor.java:93
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(Interceptor$Chain) RetryAndFollowUpInterceptor.java:124
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.RealCall.getResponseWithInterceptorChain() RealCall.java:198
okhttp3.RealCall.execute() RealCall.java:83

I found out that they have a fix for this

https://bugs.openjdk.java.net/browse/JDK-8172578

in JDK 8u152 (early access). I have installed it on one of our machines. Now I am waiting to see some good results.

Solution 5

You should consider a Non-blocking HTTP client like Grizzly or Netty which do not have blocking operations to hang a thread.

Share:
46,833
Piotr Müller
Author by

Piotr Müller

Updated on July 08, 2022

Comments

  • Piotr Müller
    Piotr Müller almost 2 years

    Performing millions of HTTP requests with different Java libraries gives me threads hanged on:

    java.net.SocketInputStream.socketRead0()

    Which is native function.

    I tried to set up Apche Http Client and RequestConfig to have timeouts on (I hope) everythig that is possible but still, I have (probably infinite) hangs on socketRead0. How to get rid of them?

    Hung ratio is about ~1 per 10000 requests (to 10000 different hosts) and it can last probably forever (I've confirmed thread hung as still valid after 10 hours).

    JDK 1.8 on Windows 7.

    My HttpClient factory:

    SocketConfig socketConfig = SocketConfig.custom()
                .setSoKeepAlive(false)
                .setSoLinger(1)
                .setSoReuseAddress(true)
                .setSoTimeout(5000)
                .setTcpNoDelay(true).build();
    
        HttpClientBuilder builder = HttpClientBuilder.create();
        builder.disableAutomaticRetries();
        builder.disableContentCompression();
        builder.disableCookieManagement();
        builder.disableRedirectHandling();
        builder.setConnectionReuseStrategy(new NoConnectionReuseStrategy());
        builder.setDefaultSocketConfig(socketConfig);
    
        return HttpClientBuilder.create().build();
    

    My RequestConfig factory:

        HttpGet request = new HttpGet(url);
    
        RequestConfig config = RequestConfig.custom()
                .setCircularRedirectsAllowed(false)
                .setConnectionRequestTimeout(8000)
                .setConnectTimeout(4000)
                .setMaxRedirects(1)
                .setRedirectsEnabled(true)
                .setSocketTimeout(5000)
                .setStaleConnectionCheckEnabled(true).build();
        request.setConfig(config);
    
        return new HttpGet(url);
    

    OpenJDK socketRead0 source

    Note: Actually I have some "trick" - I can schedule .getConnectionManager().shutdown() in other Thread with cancellation of Future if request finished properly, but it is depracated and also it kills whole HttpClient, not only that single request.

  • user207421
    user207421 about 9 years
    Socket read timeout defines the maximum interval between entering the recv() method and the arrival of data. It has nothing to do with the interval befween read operations, or between packets.
  • user207421
    user207421 about 9 years
    Correct. Doesn't change the error in your answer. The timer starts when you enter recv(), or read(), and stops when it expires or data or arrives or EOS or an error occurs. Nothing to do with the interval between two reads or two packets. What you've written above doesn't begin to make sense. It implies that you can't get a timeout on the first read, for example. And the time between two reads isn't the same thing as the time between two packets in the first place.
  • ok2c
    ok2c about 9 years
    Marvelous. The problem with your wonderful argument is that Java attempts to abstract away low level TCP/IP machinery and provides a different contract based on I/O stream APIs. Consumer of the API has no control over timers, buffers or #recv() method. The consumer can see how long the current execution thread stays blocked in a read operation. For long streams of data such as an HTTP content body what matters is how long it takes for an operation to unblock or in other words how long one read operation stays inactive before the next one can begin and reset the timer.
  • user207421
    user207421 about 9 years
    The problem with your 'marvelous' answer is that it is false, as you could easily have determined by experiment, rather than just arguing about it, and posting yet more unsubstantiated nonsense about the intervals between two reads, or two packets, or whatever else you're trying to contort this into. I suggest you try it before you debate this further.
  • ok2c
    ok2c about 9 years
    And that was it? Suggestion to try it out actually works both ways.
  • user207421
    user207421 about 9 years
    It's your answer: it's your assertion: it's been challenged. It's up to you to prove it. Or rather, them, as you've asserted two mutually inconsistent positions. Let us know when you have some evidence, or an acceptable source to cite, for whichever of them you decide to maintain. But they can't both be right. I will just hint that I'm not guessing about this.
  • user207421
    user207421 about 9 years
    Depressing it may be, but that doesn't absolve you of the responsibility of backing up your claims. In this case you can't, as you're embedded in self-contradiction. Suppose you explain this: if it's the interval between packets, or reads, whichever you like, how come you can get a timeout on the first read? Or the first packet? And cut out the personal remarks. All they accomplish is to underline your lack of proof and logical argument.
  • ok2c
    ok2c about 9 years
    Allow me to ignore most of your blathering. As far as the question goes: read operations starts, no packet arrives, read unblocks with exception, no more reads attempted, socket timeout equals the maximum period inactivity between consecutive reads.
  • Piotr Müller
    Piotr Müller about 9 years
    Thanks for detailed answer. Link about eviction policy was that what i'm looking for. I have done similar thing with whole connection manager, now I know how to do it on actual separate connections. Thanks. But finally probably I will switch to non blocking client.
  • Piotr Müller
    Piotr Müller about 9 years
    Good idea and probably I will finish with that, but I just wanted to clarify how to achieve that with blocking Http (to get socketRead0 called, but not hang). So other response accepted. Thanks. I would only add that Apache Http Client also has async non blocking version.
  • ok2c
    ok2c about 9 years
    Eviction policy is intended to remove stale idle connections. It will have no effect of what so ever on connections leased from the pool and being used to execute requests (and blocked in a read operation).
  • Piotr Müller
    Piotr Müller about 9 years
    @oleg If so, I've unaccepted the answer. Maybe something new will came up.
  • ok2c
    ok2c about 9 years
    If you want to figure out what is going please get me the wire log of the hanging sessions as I requested in my answer
  • Arya
    Arya over 7 years
    It looks like the bug has been fixed in September. Have you stopped experiencing the problem?
  • Piotr Müller
    Piotr Müller almost 7 years
    Thanks for the update, please notify about the results.
  • Stefan Matei
    Stefan Matei almost 7 years
    No luck. It got stuck over the night. I will try to contact them at oracle about the bug. It was marked as resolved. And also find a workaround (abort connection from another thread) as I got tired of restarting the machines every day.
  • buzz3791
    buzz3791 over 6 years
    @Stefan Thanks for the info. If you get a bug filed against the Windows JDK please post the bug number on this stackoverflow question.
  • stolsvik
    stolsvik over 6 years
    Of course @oleg is right: If the server you're connected to is extremely slow, sending a 1TB file via one byte per 4.9 seconds, you will spend very much time blocked on that socketRead0(), without being kicked out by the timeout. Once you have lots of threads in this situation, you've depleted your thread pool, and the system is "down". This is one of the reasons why HTTP/REST is a shitty solution for comms between "Micro Services".
  • chiperortiz
    chiperortiz almost 2 years
    Still happening in Java 8 U181 on Windows.