Problem with gRPC setup. Getting an intermittent RPC unavailable error
Solution 1
One other thing I'm doing is checking the connection to see if it's either open, idle or connecting, and reusing the connection if so. Otherwise, redialing.
grpc will manage your connections for you, reconnecting when needed, so you should never need to monitor it after creating it unless you have very specific needs.
"transport is closing" has many different reasons for happening; please see the relevant question in our FAQ and let us know if you still have questions: https://github.com/grpc/grpc-go#the-rpc-failed-with-error-code--unavailable-desc--transport-is-closing
Solution 2
After much search, I have finally come to an acceptable and logical solution to this problem.
The root-cause is this: The underlying TCP connection is closed abruptly, but neither the gRPC Client nor Server are 'notified' of this event.
The challenge is at multiple levels:
- Kernel's management of TCP sockets
- Any intermediary load-balancers/reverse-proxies (by Cloud Providers or otherwise) and how they manage TCP sockets
- Your application layer itself and it's networking requirements - whether it can reuse the same connection for future requests not
My solution turned out to be fairly simple:
server = grpc.NewServer(
grpc.KeepaliveParams(keepalive.ServerParameters{
MaxConnectionIdle: 5 * time.Minute, // <--- This fixes it!
}),
)
This ensures that the gRPC server closes the underlying TCP socket gracefully itself before any abrupt kills from the kernel or intermediary servers (AWS and Google Cloud Load Balancers both have larger timeouts than 5 minutes).
The added bonus you will find here is also that any places where you're using multiple connections, any leaks introduced by clients that forget to Close the connection will also not affect your server.
My $0.02: Don't blindly trust any organisation's (even Google's) ability to design and maintain API. This is a classic case of defaults-gone-wrong.
Solution 3
I had about the same issue earlier this year . After about 15 minuets I had servers close the connection.
My solution which is working was to create my connection
with grpc.Dial
once on my main
function then create the pb.NewAppClient(connection)
on each request. Since the connection
was already created latency wasn't an issue. After the request was done I closed the client.
harumphfrog
Updated on June 05, 2022Comments
-
harumphfrog almost 2 years
I have a grpc server and client that works as expected most of the time, but do get a "transport is closing" error occasionally:
rpc error: code = Unavailable desc = transport is closing
I'm wondering if it's a problem with my setup. The client is pretty basic
connection, err := grpc.Dial(address, grpc.WithInsecure(), grpc.WithBlock()) pb.NewAppClient(connection) defer connection.Close()
and calls are made with a timeout like
ctx, cancel := context.WithTimeout(ctx, 300*time.Millisecond) defer cancel() client.MyGRPCMethod(ctx, params)
One other thing I'm doing is checking the connection to see if it's either open, idle or connecting, and reusing the connection if so. Otherwise, redialing.
Nothing special configuration is happening with the server
grpc.NewServer()
Are there any common mistakes setting up a grpc client/server that I might be making?
-
Angad about 5 yearsThis is helpful, but isn't a great solution - the whole point should be to be able to share the
ClientConn
. Unable to find an acceptable reasoning as to why this is happening :( -
Trevor V about 5 yearsI agree it's not the best solution but it recuses the
transport is closing
error due to a connection timeout. I did some real tests after I had this same issue and it averaged out about 1ms more per call. If your app is a web application it's not noticeable especially compared to other languages that are 10x slower like php, python, est. -
Angad about 5 yearsjust posted an answer with what may be a better solution. Do let me know what you think
-
Trevor V about 5 yearsI did a 20 min test on my localhost and it kept the connection. I will go ahead and try this afternoon in kubernetes and docker to see if the connections are kept. If that works I agree google would need to update documentation.