kube-dns can not resolve 'kubernetes.default.svc.cluster.local'
Solution 1
Can you take a look at the output from ps auxf | grep dockerd
.
Kargo is adding the setting iptables=false
to the docker daemon. As far as I can see, this is causing issues with container to host networking as connecting to 10.233.0.1:443 will follow iptable rules that forward the request to one of the master nodes' api server.
The other kubernetes services have their networking bound to the host so you will not experience the issue.
I'm not sure if this is the root issue, however removing iptables=false
from the docker daemon settings has fixed any issues we were experiencing. This is not disabled by default and is not expected to be disabled for using network overlays like flannel.
Removing the iptables option for the docker daemon can be done from /etc/systemd/system/docker.service.d/docker-options.conf which should look something like this:
[root@k8s-joy-g2eqd2 ~]# cat /etc/systemd/system/docker.service.d/docker-options.conf
[Service]
Environment="DOCKER_OPTS=--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker --iptables=false"
Once this is updated you can run systemctl daemon-reload
to register the change and then systemctl restart docker
.
This will allow you to test if this fixes your issue. Once you can confirm this is the fix, you can override the docker_options
variable in the kargo deployment to exclude that rule:
docker_options: "--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker"
Solution 2
According to the error you posted, kubedns
can not communicate with the API server:
dial tcp 10.233.0.1:443: i/o timeout
This can mean three things:
Your network fabric for containers is not configured properly
- Look for errors in the logs of the network solution you're using
- Make sure every Docker deamon is using its own IP range
- Verify that the container network does not overlap with the host network
You have a problem with your kube-proxy
and the network traffic is not forwarded to the API server when using the kubernetes
internal Service (10.233.0.1)
- Check the
kube-proxy
logs on your nodes (kubeminion{1,2}) and update your question with any error you may find
If you are also seeing authentication errors:
kube-controller-manager
does not produce valid Service Account tokens
Check that the
--service-account-private-key-file
and--root-ca-file
flags ofkube-controller-manager
are set to a valid key/cert and restart the serviceDelete the
default-token-xxxx
secret in thekube-system
namespace and recreate thekube-dns
Deployment
mootez
Updated on September 02, 2022Comments
-
mootez over 1 year
After deploying the kubernetes cluster using kargo, I found out that kubedns pod is not working properly:
$ kcsys get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE dnsmasq-alv8k 1/1 Running 2 1d 10.233.86.2 kubemaster dnsmasq-c9y52 1/1 Running 2 1d 10.233.82.2 kubeminion1 dnsmasq-sjouh 1/1 Running 2 1d 10.233.76.6 kubeminion2 kubedns-hxaj7 2/3 CrashLoopBackOff 339 22h 10.233.76.3 kubeminion2
PS :
kcsys
is an alias ofkubectl --namespace=kube-system
Logs for each container (kubedns, dnsmasq) seems OK except healthz container as following:
2017/03/01 07:24:32 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local' error exit status 1
Update
kubedns rc description
apiVersion: v1 kind: ReplicationController metadata: creationTimestamp: 2017-02-28T08:31:57Z generation: 1 labels: k8s-app: kubedns kubernetes.io/cluster-service: "true" version: v19 name: kubedns namespace: kube-system resourceVersion: "130982" selfLink: /api/v1/namespaces/kube-system/replicationcontrollers/kubedns uid: 5dc9f9f2-fd90-11e6-850d-005056a020b4 spec: replicas: 1 selector: k8s-app: kubedns version: v19 template: metadata: creationTimestamp: null labels: k8s-app: kubedns kubernetes.io/cluster-service: "true" version: v19 spec: containers: - args: - --domain=cluster.local. - --dns-port=10053 - --v=2 image: gcr.io/google_containers/kubedns-amd64:1.9 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 5 httpGet: path: /healthz port: 8080 scheme: HTTP initialDelaySeconds: 60 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: kubedns ports: - containerPort: 10053 name: dns-local protocol: UDP - containerPort: 10053 name: dns-tcp-local protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readiness port: 8081 scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: limits: cpu: 100m memory: 170Mi requests: cpu: 70m memory: 70Mi terminationMessagePath: /dev/termination-log - args: - --log-facility=- - --cache-size=1000 - --no-resolv - --server=127.0.0.1#10053 image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3 imagePullPolicy: IfNotPresent name: dnsmasq ports: - containerPort: 53 name: dns protocol: UDP - containerPort: 53 name: dns-tcp protocol: TCP resources: limits: cpu: 100m memory: 170Mi requests: cpu: 70m memory: 70Mi terminationMessagePath: /dev/termination-log - args: - -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null - -port=8080 - -quiet image: gcr.io/google_containers/exechealthz-amd64:1.1 imagePullPolicy: IfNotPresent name: healthz ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: 10m memory: 50Mi requests: cpu: 10m memory: 50Mi terminationMessagePath: /dev/termination-log dnsPolicy: Default restartPolicy: Always securityContext: {} terminationGracePeriodSeconds: 30 status: fullyLabeledReplicas: 1 observedGeneration: 1 replicas: 1`
kubedns svc description:
apiVersion: v1 kind: Service metadata: creationTimestamp: 2017-02-28T08:31:58Z labels: k8s-app: kubedns kubernetes.io/cluster-service: "true" kubernetes.io/name: kubedns name: kubedns namespace: kube-system resourceVersion: "10736" selfLink: /api/v1/namespaces/kube-system/services/kubedns uid: 5ed4dd78-fd90-11e6-850d-005056a020b4 spec: clusterIP: 10.233.0.3 ports: - name: dns port: 53 protocol: UDP targetPort: 53 - name: dns-tcp port: 53 protocol: TCP targetPort: 53 selector: k8s-app: kubedns sessionAffinity: None type: ClusterIP status: loadBalancer: {}
I catch some errors in kubedns container:
1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout 1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://10.233.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
UPDATE 2
- iptables rules created by kube-proxy when creating hostnames service with 3 pods:
-
mootez about 7 yearsi had updated my post according to your need. PS: during the provisioning with kargo, each time the playbooks crashes, i resolve the corresponding problem and i rerun the playbooks from the beginning. 1. kube-proxy was working fine. tried even to create a "hostname service" with 3 pods (find the exemple in the "debugging service" post in kubernetes.io) and when i requested the service with curl, it works like a charm. 2.
--service-account-private-key-file
and--root-ca-file
flags of kube-controller-manager are set as shown in the above screenshot. -
mootez about 7 years3. i deleted both
default-token-xxxx
withkcsys delete secret default-token-xxxx
and kubedns deploymentkcsys delete rc kubedns
. default token was recreated automatically and i rerun kubedns as followingkubectl create -f /etc/kubernetes/kubedns-rc.yml
-
mootez about 7 yearsunfortunately kubedns container still crushing as mentioned in the last screenshot.
-
mootez about 7 yearshey @Antoine. i found out that i cannot reach the kubemaster from one pod in kubeminion(1/2) with ping, but i can ping from kubeminion(1/2) hosts to kubemaster ip. what can be the problem ?
-
Antoine Cotten about 7 yearsIn that case your network fabric is not working properly. I added some hints for troubleshooting. What are you using: Flannel, Weave Net, ...? With what kind of backend: udp, VXLAN, ...?
-
mootez about 7 yearsthank you so much. this is what was really causing my issue