Debugging DNS Resolution Issue in Kubernetes

6,230

I've suggested you to follow this so we could isolate possible problems in your CoreDNS and as you can see it's working fine.

Reaching individual pod if it's not statefulset seems not very important at least in my k8s usage (could be for everyone).

It's possible to reach a pod using a DNS record but as you stated it's not very important on regular K8s implementations.

When enabled, pods are assigned a DNS A record in the form of pod-ip-address.my-namespace.pod.cluster.local.

For example, a pod with IP 1.2.3.4 in the namespace default with a DNS name of cluster.local would have an entry: 1-2-3-4.default.pod.cluster.local. Source

EXAMPLE

$ kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE     IP          NODE                                 NOMINATED NODE   READINESS GATES
dnsutils     1/1     Running   20         20h     10.28.2.3   gke-lab-default-pool-87c6b085-wcp8   <none>           <none>
sample-pod   1/1     Running   0          2m11s   10.28.2.4   gke-lab-default-pool-87c6b085-wcp8   <none>           <none>

$ kubectl exec -ti dnsutils -- nslookup 10-28-2-4.default.pod.cluster.local
Server:     10.31.240.10
Address:    10.31.240.10#53

Name:   10-28-2-4.default.pod.cluster.local
Address: 10.28.2.4

Funny fact, but maybe this is how Kubernetes should work?

Yes, your CoreDNS is working as intended and everything you described is expected.

Share:
6,230

Related videos on Youtube

laimison
Author by

laimison

Updated on September 18, 2022

Comments

  • laimison
    laimison over 1 year

    I have built a Kubernetes cluster using Kubespray on Ubuntu 18.04 and facing DNS issue so basically containers cannot communicate through their hostnames.

    Things that are working:

    • containers communication through IP addresses
    • internet is working from the container
    • able to resolve kubernetes.default

    Kubernetes master:

    root@k8s-1:~# cat /etc/resolv.conf | grep -v ^\\#
    nameserver 127.0.0.53
    search home
    root@k8s-1:~# 
    

    Pod:

    root@k8s-1:~# kubectl exec dnsutils cat /etc/resolv.conf
    nameserver 169.254.25.10
    search default.svc.cluster.local svc.cluster.local cluster.local home
    options ndots:5
    root@k8s-1:~# 
    

    CoreDNS pods are healthy:

    root@k8s-1:~# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns        
    NAME                       READY   STATUS    RESTARTS   AGE
    coredns-58687784f9-8rmlw   1/1     Running   0          35m
    coredns-58687784f9-hp8hp   1/1     Running   0          35m
    root@k8s-1:~#
    

    Logs for CoreDNS pods:

    root@k8s-1:~# kubectl describe pods --namespace=kube-system -l k8s-app=kube-dns | tail -n 2
      Normal   Started           35m                 kubelet, k8s-2     Started container coredns
      Warning  DNSConfigForming  12s (x33 over 35m)  kubelet, k8s-2     Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 4.2.2.1 4.2.2.2 208.67.220.220
    
    root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-8rmlw
    .:53
    2020-02-09T22:56:14.390Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2
    2020-02-09T22:56:14.391Z [INFO] CoreDNS-1.6.0
    2020-02-09T22:56:14.391Z [INFO] linux/amd64, go1.12.7, 0a218d3
    CoreDNS-1.6.0
    linux/amd64, go1.12.7, 0a218d3
    root@k8s-1:~#
    
    root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-hp8hp
    .:53
    2020-02-09T22:56:20.388Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2
    2020-02-09T22:56:20.388Z [INFO] CoreDNS-1.6.0
    2020-02-09T22:56:20.388Z [INFO] linux/amd64, go1.12.7, 0a218d3
    CoreDNS-1.6.0
    linux/amd64, go1.12.7, 0a218d3
    root@k8s-1:~#
    

    CoreDNS seems exposed:

    root@k8s-1:~# kubectl get svc --namespace=kube-system | grep coredns
    coredns                ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   37m
    root@k8s-1:~#
    
    root@k8s-1:~# kubectl get ep coredns --namespace=kube-system
    NAME      ENDPOINTS                                                  AGE
    coredns   10.233.64.2:53,10.233.65.3:53,10.233.64.2:53 + 3 more...   37m
    root@k8s-1:~#
    

    These are my problematic pods - all cluster affected because of this issue:

    root@k8s-1:~# kubectl get pods -o wide -n default
    NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
    busybox                  1/1     Running   0          17m   10.233.66.7   k8s-3   <none>           <none>
    dnsutils                 1/1     Running   0          50m   10.233.66.5   k8s-3   <none>           <none>
    nginx-86c57db685-p8zhc   1/1     Running   0          43m   10.233.64.3   k8s-1   <none>           <none>
    nginx-86c57db685-st7rw   1/1     Running   0          47m   10.233.66.6   k8s-3   <none>           <none>
    root@k8s-1:~# 
    

    Able to reach internet using DNS and container through IP address:

    root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping 10.233.64.3"
    PING 10.233.64.3 (10.233.64.3) 56(84) bytes of data.
    64 bytes from 10.233.64.3: icmp_seq=1 ttl=62 time=0.481 ms
    64 bytes from 10.233.64.3: icmp_seq=2 ttl=62 time=0.551 ms
    ...
    
    root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping google.com"
    PING google.com (172.217.21.174) 56(84) bytes of data.
    64 bytes from fra07s64-in-f174.1e100.net (172.217.21.174): icmp_seq=1 ttl=61 time=77.9 ms
    ...
    
    root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping kubernetes.default"
    PING kubernetes.default.svc.cluster.local (10.233.0.1) 56(84) bytes of data.
    64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=1 ttl=64 time=0.030 ms
    64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=2 ttl=64 time=0.069 ms
    ...
    

    Actual issue:

    root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping nginx-86c57db685-p8zhc"
    ping: nginx-86c57db685-p8zhc: Name or service not known
    command terminated with exit code 2
    root@k8s-1:~#
    
    root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping dnsutils"
    ping: dnsutils: Name or service not known
    command terminated with exit code 2
    root@k8s-1:~#
    
    oot@k8s-1:~# kubectl exec -ti busybox -- nslookup nginx-86c57db685-p8zhc
    Server:     169.254.25.10
    Address:    169.254.25.10:53
    
    ** server can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: NXDOMAIN
    
    *** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer
    *** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer
    *** Can't find nginx-86c57db685-p8zhc.home: No answer
    *** Can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: No answer
    *** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer
    *** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer
    *** Can't find nginx-86c57db685-p8zhc.home: No answer
    
    command terminated with exit code 1
    root@k8s-1:~#
    

    Am I missing something or how to fix communication between containers using hostnames?

    Many thanks

    Updated

    More checks:

    root@k8s-1:~# kubectl exec -ti dnsutils -- nslookup kubernetes.default
    Server:     169.254.25.10
    Address:    169.254.25.10#53
    
    Name:   kubernetes.default.svc.cluster.local
    Address: 10.233.0.1
    

    I have created StatefulSet:

    kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/application/web/web.yaml
    

    An I'm able to ping service "nginx":

    root@k8s-1:~/kplay# k exec dnsutils -it nslookup nginx
    Server:     169.254.25.10
    Address:    169.254.25.10#53
    
    Name:   nginx.default.svc.cluster.local
    Address: 10.233.66.8
    Name:   nginx.default.svc.cluster.local
    Address: 10.233.64.3
    Name:   nginx.default.svc.cluster.local
    Address: 10.233.65.5
    Name:   nginx.default.svc.cluster.local
    Address: 10.233.66.6
    

    Also able to contact statefulset members when using FQDN

    root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0.nginx.default.svc.cluster.local
    Server:     169.254.25.10
    Address:    169.254.25.10#53
    
    Name:   web-0.nginx.default.svc.cluster.local
    Address: 10.233.65.5
    
    root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1.nginx.default.svc.cluster.local
    Server:     169.254.25.10
    Address:    169.254.25.10#53
    
    Name:   web-1.nginx.default.svc.cluster.local
    Address: 10.233.66.8
    

    But not using just hostnames:

    root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0
    Server:     169.254.25.10
    Address:    169.254.25.10#53
    
    ** server can't find web-0: NXDOMAIN
    
    command terminated with exit code 1
    root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1
    Server:     169.254.25.10
    Address:    169.254.25.10#53
    
    ** server can't find web-1: NXDOMAIN
    
    command terminated with exit code 1
    root@k8s-1:~/kplay#
    

    All of them are living in the same namespace:

    root@k8s-1:~/kplay# k get pods -n default
    NAME                     READY   STATUS    RESTARTS   AGE
    busybox                  1/1     Running   22         22h
    dnsutils                 1/1     Running   22         22h
    nginx-86c57db685-p8zhc   1/1     Running   0          22h
    nginx-86c57db685-st7rw   1/1     Running   0          22h
    web-0                    1/1     Running   0          11m
    web-1                    1/1     Running   0          10m
    

    Another test which confirms that I'm able to ping services:

    kubectl create deployment --image nginx some-nginx
    kubectl scale deployment --replicas 2 some-nginx
    kubectl expose deployment some-nginx --port=12345 --type=NodePort
    
    root@k8s-1:~/kplay# k exec dnsutils -it nslookup some-nginx
    Server:     169.254.25.10
    Address:    169.254.25.10#53
    
    Name:   some-nginx.default.svc.cluster.local
    Address: 10.233.63.137
    

    Final thoughts

    Funny fact, but maybe this is how Kubernetes should work? I'm able to reach service hostname and statefulset members if wanted to reach some pod individually. Reaching individual pod if it's not statefulset seems not very important at least in my k8s usage (could be for everyone).

  • laimison
    laimison about 4 years
    Thanks @mWatney for your answer. That is an interesting story on how I spent 2 days on an issue that doesn't exist. Welcome to Kubernetes world :)
  • Mark Watney
    Mark Watney about 4 years
    Glad to be able to confirm your assumptions. Welcome to Kubernetes world!