kube-dns can not resolve 'kubernetes.default.svc.cluster.local'

docker kubernetes kube-dns

10,240

Solution 1

Can you take a look at the output from ps auxf | grep dockerd.

Kargo is adding the setting iptables=false to the docker daemon. As far as I can see, this is causing issues with container to host networking as connecting to 10.233.0.1:443 will follow iptable rules that forward the request to one of the master nodes' api server.

The other kubernetes services have their networking bound to the host so you will not experience the issue.

I'm not sure if this is the root issue, however removing iptables=false from the docker daemon settings has fixed any issues we were experiencing. This is not disabled by default and is not expected to be disabled for using network overlays like flannel.

Removing the iptables option for the docker daemon can be done from /etc/systemd/system/docker.service.d/docker-options.conf which should look something like this:

[root@k8s-joy-g2eqd2 ~]# cat /etc/systemd/system/docker.service.d/docker-options.conf [Service] Environment="DOCKER_OPTS=--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker --iptables=false"

Once this is updated you can run systemctl daemon-reload to register the change and then systemctl restart docker.

This will allow you to test if this fixes your issue. Once you can confirm this is the fix, you can override the docker_options variable in the kargo deployment to exclude that rule:

docker_options: "--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker"

Solution 2

According to the error you posted, kubedns can not communicate with the API server:

dial tcp 10.233.0.1:443: i/o timeout

This can mean three things:

Your network fabric for containers is not configured properly

Look for errors in the logs of the network solution you're using
Make sure every Docker deamon is using its own IP range
Verify that the container network does not overlap with the host network

You have a problem with your kube-proxy and the network traffic is not forwarded to the API server when using the kubernetes internal Service (10.233.0.1)

Check the kube-proxy logs on your nodes (kubeminion{1,2}) and update your question with any error you may find

If you are also seeing authentication errors:

kube-controller-manager does not produce valid Service Account tokens

Check that the --service-account-private-key-file and --root-ca-file flags of kube-controller-manager are set to a valid key/cert and restart the service
Delete the default-token-xxxx secret in the kube-system namespace and recreate the kube-dns Deployment

10,240

Author by

mootez

Updated on September 02, 2022

Comments

mootez over 1 year

After deploying the kubernetes cluster using kargo, I found out that kubedns pod is not working properly:

$ kcsys get pods -o wide

NAME          READY STATUS           RESTARTS AGE  IP           NODE
dnsmasq-alv8k 1/1   Running          2        1d   10.233.86.2  kubemaster
dnsmasq-c9y52 1/1   Running          2        1d   10.233.82.2  kubeminion1
dnsmasq-sjouh 1/1   Running          2        1d   10.233.76.6  kubeminion2
kubedns-hxaj7 2/3   CrashLoopBackOff 339      22h  10.233.76.3  kubeminion2

PS : kcsys is an alias of kubectl --namespace=kube-system

Logs for each container (kubedns, dnsmasq) seems OK except healthz container as following:

2017/03/01 07:24:32 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local' error exit status 1

Update

kubedns rc description

apiVersion: v1
kind: ReplicationController
metadata:
  creationTimestamp: 2017-02-28T08:31:57Z
  generation: 1
  labels:
    k8s-app: kubedns
    kubernetes.io/cluster-service: "true"
    version: v19
  name: kubedns
  namespace: kube-system
  resourceVersion: "130982"
  selfLink: /api/v1/namespaces/kube-system/replicationcontrollers/kubedns
  uid: 5dc9f9f2-fd90-11e6-850d-005056a020b4
spec:
  replicas: 1
  selector:
    k8s-app: kubedns
    version: v19
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: kubedns
        kubernetes.io/cluster-service: "true"
        version: v19
    spec:
      containers:
      - args:
        - --domain=cluster.local.
        - --dns-port=10053
        - --v=2
        image: gcr.io/google_containers/kubedns-amd64:1.9
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kubedns
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP          
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 100m
            memory: 170Mi
          requests:
            cpu: 70m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
      - args:
        - --log-facility=-
        - --cache-size=1000
        - --no-resolv
        - --server=127.0.0.1#10053
        image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3
        imagePullPolicy: IfNotPresent
        name: dnsmasq
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 170Mi
          requests:
            cpu: 70m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
      - args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
          && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
        - -port=8080
        - -quiet
        image: gcr.io/google_containers/exechealthz-amd64:1.1
        imagePullPolicy: IfNotPresent
        name: healthz
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: 10m
            memory: 50Mi
          requests:
            cpu: 10m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
      dnsPolicy: Default
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  fullyLabeledReplicas: 1
  observedGeneration: 1
  replicas: 1`

kubedns svc description:

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2017-02-28T08:31:58Z
  labels:
    k8s-app: kubedns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: kubedns
  name: kubedns
  namespace: kube-system
  resourceVersion: "10736"
  selfLink: /api/v1/namespaces/kube-system/services/kubedns
  uid: 5ed4dd78-fd90-11e6-850d-005056a020b4
spec:
  clusterIP: 10.233.0.3
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  selector:
    k8s-app: kubedns
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

I catch some errors in kubedns container:

1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://10.233.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout

UPDATE 2

iptables rules created by kube-proxy when creating hostnames service with 3 pods:

flags of controller-manager pod:
pods status

mootez about 7 years

i had updated my post according to your need. PS: during the provisioning with kargo, each time the playbooks crashes, i resolve the corresponding problem and i rerun the playbooks from the beginning. 1. kube-proxy was working fine. tried even to create a "hostname service" with 3 pods (find the exemple in the "debugging service" post in kubernetes.io) and when i requested the service with curl, it works like a charm. 2. --service-account-private-key-file and --root-ca-file flags of kube-controller-manager are set as shown in the above screenshot.
mootez about 7 years

3. i deleted both default-token-xxxx with kcsys delete secret default-token-xxxx and kubedns deployment kcsys delete rc kubedns. default token was recreated automatically and i rerun kubedns as following kubectl create -f /etc/kubernetes/kubedns-rc.yml
mootez about 7 years

unfortunately kubedns container still crushing as mentioned in the last screenshot.
mootez about 7 years

hey @Antoine. i found out that i cannot reach the kubemaster from one pod in kubeminion(1/2) with ping, but i can ping from kubeminion(1/2) hosts to kubemaster ip. what can be the problem ?
Antoine Cotten about 7 years

In that case your network fabric is not working properly. I added some hints for troubleshooting. What are you using: Flannel, Weave Net, ...? With what kind of backend: udp, VXLAN, ...?
mootez about 7 years

thank you so much. this is what was really causing my issue