kube-dns can not resolve 'kubernetes.default.svc.cluster.local'

10,240

Solution 1

Can you take a look at the output from ps auxf | grep dockerd.

Kargo is adding the setting iptables=false to the docker daemon. As far as I can see, this is causing issues with container to host networking as connecting to 10.233.0.1:443 will follow iptable rules that forward the request to one of the master nodes' api server.

The other kubernetes services have their networking bound to the host so you will not experience the issue.

I'm not sure if this is the root issue, however removing iptables=false from the docker daemon settings has fixed any issues we were experiencing. This is not disabled by default and is not expected to be disabled for using network overlays like flannel.

Removing the iptables option for the docker daemon can be done from /etc/systemd/system/docker.service.d/docker-options.conf which should look something like this:

[root@k8s-joy-g2eqd2 ~]# cat /etc/systemd/system/docker.service.d/docker-options.conf [Service] Environment="DOCKER_OPTS=--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker --iptables=false"

Once this is updated you can run systemctl daemon-reload to register the change and then systemctl restart docker.

This will allow you to test if this fixes your issue. Once you can confirm this is the fix, you can override the docker_options variable in the kargo deployment to exclude that rule:

docker_options: "--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker"

Solution 2

According to the error you posted, kubedns can not communicate with the API server:

dial tcp 10.233.0.1:443: i/o timeout

This can mean three things:


Your network fabric for containers is not configured properly

  • Look for errors in the logs of the network solution you're using
  • Make sure every Docker deamon is using its own IP range
  • Verify that the container network does not overlap with the host network

You have a problem with your kube-proxy and the network traffic is not forwarded to the API server when using the kubernetes internal Service (10.233.0.1)

  • Check the kube-proxy logs on your nodes (kubeminion{1,2}) and update your question with any error you may find

If you are also seeing authentication errors:

kube-controller-manager does not produce valid Service Account tokens

  • Check that the --service-account-private-key-file and --root-ca-file flags of kube-controller-manager are set to a valid key/cert and restart the service

  • Delete the default-token-xxxx secret in the kube-system namespace and recreate the kube-dns Deployment

Share:
10,240
mootez
Author by

mootez

Updated on September 02, 2022

Comments

  • mootez
    mootez over 1 year

    After deploying the kubernetes cluster using kargo, I found out that kubedns pod is not working properly:

    $ kcsys get pods -o wide
    
    NAME          READY STATUS           RESTARTS AGE  IP           NODE
    dnsmasq-alv8k 1/1   Running          2        1d   10.233.86.2  kubemaster
    dnsmasq-c9y52 1/1   Running          2        1d   10.233.82.2  kubeminion1
    dnsmasq-sjouh 1/1   Running          2        1d   10.233.76.6  kubeminion2
    kubedns-hxaj7 2/3   CrashLoopBackOff 339      22h  10.233.76.3  kubeminion2
    

    PS : kcsys is an alias of kubectl --namespace=kube-system

    Logs for each container (kubedns, dnsmasq) seems OK except healthz container as following:

    2017/03/01 07:24:32 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local' error exit status 1
    

    Update

    kubedns rc description

    apiVersion: v1
    kind: ReplicationController
    metadata:
      creationTimestamp: 2017-02-28T08:31:57Z
      generation: 1
      labels:
        k8s-app: kubedns
        kubernetes.io/cluster-service: "true"
        version: v19
      name: kubedns
      namespace: kube-system
      resourceVersion: "130982"
      selfLink: /api/v1/namespaces/kube-system/replicationcontrollers/kubedns
      uid: 5dc9f9f2-fd90-11e6-850d-005056a020b4
    spec:
      replicas: 1
      selector:
        k8s-app: kubedns
        version: v19
      template:
        metadata:
          creationTimestamp: null
          labels:
            k8s-app: kubedns
            kubernetes.io/cluster-service: "true"
            version: v19
        spec:
          containers:
          - args:
            - --domain=cluster.local.
            - --dns-port=10053
            - --v=2
            image: gcr.io/google_containers/kubedns-amd64:1.9
            imagePullPolicy: IfNotPresent
            livenessProbe:
              failureThreshold: 5
              httpGet:
                path: /healthz
                port: 8080
                scheme: HTTP
              initialDelaySeconds: 60
              periodSeconds: 10
              successThreshold: 1
              timeoutSeconds: 5
            name: kubedns
            ports:
            - containerPort: 10053
              name: dns-local
              protocol: UDP
            - containerPort: 10053
              name: dns-tcp-local
              protocol: TCP
            readinessProbe:
              failureThreshold: 3
              httpGet:
                path: /readiness
                port: 8081
                scheme: HTTP          
              initialDelaySeconds: 30
              periodSeconds: 10
              successThreshold: 1
              timeoutSeconds: 5
            resources:
              limits:
                cpu: 100m
                memory: 170Mi
              requests:
                cpu: 70m
                memory: 70Mi
            terminationMessagePath: /dev/termination-log
          - args:
            - --log-facility=-
            - --cache-size=1000
            - --no-resolv
            - --server=127.0.0.1#10053
            image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3
            imagePullPolicy: IfNotPresent
            name: dnsmasq
            ports:
            - containerPort: 53
              name: dns
              protocol: UDP
            - containerPort: 53
              name: dns-tcp
              protocol: TCP
            resources:
              limits:
                cpu: 100m
                memory: 170Mi
              requests:
                cpu: 70m
                memory: 70Mi
            terminationMessagePath: /dev/termination-log
          - args:
            - -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
              && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
            - -port=8080
            - -quiet
            image: gcr.io/google_containers/exechealthz-amd64:1.1
            imagePullPolicy: IfNotPresent
            name: healthz
            ports:
            - containerPort: 8080
              protocol: TCP
            resources:
              limits:
                cpu: 10m
                memory: 50Mi
              requests:
                cpu: 10m
                memory: 50Mi
            terminationMessagePath: /dev/termination-log
          dnsPolicy: Default
          restartPolicy: Always
          securityContext: {}
          terminationGracePeriodSeconds: 30
    status:
      fullyLabeledReplicas: 1
      observedGeneration: 1
      replicas: 1`
    

    kubedns svc description:

    apiVersion: v1
    kind: Service
    metadata:
      creationTimestamp: 2017-02-28T08:31:58Z
      labels:
        k8s-app: kubedns
        kubernetes.io/cluster-service: "true"
        kubernetes.io/name: kubedns
      name: kubedns
      namespace: kube-system
      resourceVersion: "10736"
      selfLink: /api/v1/namespaces/kube-system/services/kubedns
      uid: 5ed4dd78-fd90-11e6-850d-005056a020b4
    spec:
      clusterIP: 10.233.0.3
      ports:
      - name: dns
        port: 53
        protocol: UDP
        targetPort: 53
      - name: dns-tcp
        port: 53
        protocol: TCP
        targetPort: 53
      selector:
        k8s-app: kubedns
      sessionAffinity: None
      type: ClusterIP
    status:
      loadBalancer: {}
    

    I catch some errors in kubedns container:

    1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
    1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://10.233.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
    

    UPDATE 2

    1. iptables rules created by kube-proxy when creating hostnames service with 3 pods:

    enter image description here

    1. flags of controller-manager pod: enter image description here

    2. pods status

    enter image description here

  • mootez
    mootez about 7 years
    i had updated my post according to your need. PS: during the provisioning with kargo, each time the playbooks crashes, i resolve the corresponding problem and i rerun the playbooks from the beginning. 1. kube-proxy was working fine. tried even to create a "hostname service" with 3 pods (find the exemple in the "debugging service" post in kubernetes.io) and when i requested the service with curl, it works like a charm. 2. --service-account-private-key-file and --root-ca-file flags of kube-controller-manager are set as shown in the above screenshot.
  • mootez
    mootez about 7 years
    3. i deleted both default-token-xxxx with kcsys delete secret default-token-xxxx and kubedns deployment kcsys delete rc kubedns. default token was recreated automatically and i rerun kubedns as following kubectl create -f /etc/kubernetes/kubedns-rc.yml
  • mootez
    mootez about 7 years
    unfortunately kubedns container still crushing as mentioned in the last screenshot.
  • mootez
    mootez about 7 years
    hey @Antoine. i found out that i cannot reach the kubemaster from one pod in kubeminion(1/2) with ping, but i can ping from kubeminion(1/2) hosts to kubemaster ip. what can be the problem ?
  • Antoine Cotten
    Antoine Cotten about 7 years
    In that case your network fabric is not working properly. I added some hints for troubleshooting. What are you using: Flannel, Weave Net, ...? With what kind of backend: udp, VXLAN, ...?
  • mootez
    mootez about 7 years
    thank you so much. this is what was really causing my issue