kubernetes connection refused during deployment


I got the same problem and tried to dig a bit deeper in the GKE network setup for this kind of LoadBalancing.

My suspicion is that the iptables rules on the node that runs the container are updated to early. I increased the timeouts a bit in your example to better find the stage in where the requests are getting timeouts.

My changes on your deployment:

  replicas: 1         # easier to track the state of the system
  minReadySeconds: 30 # give the load-balancer time to pick up the new node
        command: ["sh", "-c", "./hello-app"] # ignore SIGTERM and keep serving requests for 30s

Everything works well until the old pod switches from state Running to Terminating. I tested with a kubectl port-forward on the terminating pod and my requests were served without timeouts.

The following things happens during the change from Running to Terminating:

  • Pod-IP is removed from the service
  • Health check on the node returns 503 with "localEndpoints": 0
  • iptables rules are changed an that node and traffic for this service is dropped (--comment "default/myapp-lb: has no local endpoints" -j KUBE-MARK-DROP

The default settings of the load-balancer checks every 2 seconds and needs 5 failures to remove the node. This means for at least 10 seconds the packets are dropped. After I changed the interval to 1 and only switch after 1 failure the amount of dropped packages decreased.

If you are not interested in the source IP of the client, you could remove the line:

externalTrafficPolicy: Local

in your service definition and the deployments are without connection timeouts.

Tested on GKE Cluster with 4 nodes and version v1.9.7-gke.1.

Author by


Updated on September 18, 2022


  • thoas
    thoas over 1 year

    I'm trying to achieve a zero downtime deployment using kubernetes and during my test the service doesn't load balance well.

    My kubernetes manifest is:

    apiVersion: extensions/v1beta1
    kind: Deployment
      name: myapp-deployment
      replicas: 3
        type: RollingUpdate
          maxUnavailable: 0
          maxSurge: 1
            app: myapp
            version: "0.2"
          - name: myapp-container
            image: gcr.io/google-samples/hello-app:1.0
            imagePullPolicy: Always
              - containerPort: 8080
                protocol: TCP
                path: /
                port: 8080
              initialDelaySeconds: 5
              periodSeconds: 5
              successThreshold: 1
    apiVersion: v1
    kind: Service
      name: myapp-lb
        app: myapp
      type: LoadBalancer
      externalTrafficPolicy: Local
        - port: 80
          targetPort: 8080
        app: myapp

    If I loop over the service with the external IP, let's say:

    $ kubectl get services
    NAME         TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
    kubernetes   ClusterIP    <none>           443/TCP        1h
    myapp-lb     LoadBalancer   80:30549/TCP   22m

    using the bash script:

    while True
            sleep 0.2s

    I receive some connection refused during the deployment:

    curl: (7) Failed to connect to port 80: Connection refused

    The application is the default helloapp provided by Google Cloud Platform and running on 8080.

    Cluster information:

    • Kubernetes version: 1.8.8
    • Google cloud platform
    • Machine type: g1-small
    • DevopsTux
      DevopsTux about 6 years
      how frequently are you getting those connection refused? I'm trying right now the same deployment as you and removed the sleep to stress test the service and right now I'm at around 2000 requests and 0 fails.
    • DevopsTux
      DevopsTux about 6 years
      0 errors on 20.000 requests now.
    • thoas
      thoas about 6 years
      it occurs during a deployment only, try changing the version and restart the script. If I siege the service internal ip or external ip I get some connection refused
    • DevopsTux
      DevopsTux almost 6 years
      where are you launching the siege from exactly?
    • thoas
      thoas almost 6 years
      the siege is launched locally and also tested un a busybox directly in the cluster using the Cluster IP
  • thoas
    thoas almost 6 years
    thank you for your answer, siege is not the issue here since we have tested on multiple servers and even with a dead simple curl loop.
  • thoas
    thoas almost 6 years
    same issue with the minReadySeconds, I get a curl: (56) Recv failure: Connection reset by peer during a deployment
  • DevopsTux
    DevopsTux almost 6 years
    What siege does is, in fact, is a bit like a curl loop. You will end up running out of sockets with either, Kubernetes is not the problem here. Did you try my answer?
  • thoas
    thoas almost 6 years
    yes I tried your answer, it's not related to the HTTP client (we are also testing it in pur python with only one connection) and the ingress is returning some 502 status code.
  • DevopsTux
    DevopsTux over 5 years
    Is it possible you are getting this error while the public IP for the load balancer is provisioned?