kubeadm join fails with http://localhost:10248/healthz connection refused

12,752

It seems that your kubeadm token has been expired as per kubelet logs attached.

Sep 02 21:19:56 k8s-worker1 kubelet[3082]: F0902 21:19:56.814469
3082 server.go:262] failed to run Kubelet: cannot create certificate signing request: Unauthorized

TTL for this token remains 24 hours after command kubeadm init released, check this link for more information.

The master node’s system runtime components look unhealthy, not sure whether the cluster can be running fine. Although CoreDNS services are in pending state, take a look at kubeadm troubleshooting document in order to check whether any of Pod network providers have been installed on your cluster.

I recommend rebuilding cluster in order to refresh kubeadm token and bootstrap cluster system modules from scratch.

Share:
12,752
StefanSchubert
Author by

StefanSchubert

What I like best in our discipline? Being creative, making things happen. Collaborate with nice and brilliant people.

Updated on June 04, 2022

Comments

  • StefanSchubert
    StefanSchubert almost 2 years

    I'm trying to setup kubernetes (from the tutorials for centos7) on three VMs, unfortunately the joining of the worker fails. I hope someone already had this problem (found it two times on the web with no answers), or might have a guess what's going wrong.

    Here is what I get by kubeadm join:

    [preflight] running pre-flight checks
            [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
    you can solve this problem with following methods:
     1. Run 'modprobe -- ' to load missing kernel modules;
    2. Provide the missing builtin kernel ipvs support
    
    I0902 20:31:15.401693    2032 kernel_validator.go:81] Validating kernel version
    I0902 20:31:15.401768    2032 kernel_validator.go:96] Validating kernel config
            [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.06.1-ce. Max validated version: 17.03
    [discovery] Trying to connect to API Server "192.168.1.30:6443"
    [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.30:6443"
    [discovery] Requesting info from "https://192.168.1.30:6443" again to validate TLS against the pinned public key
    [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.1.30:6443"
    [discovery] Successfully established connection with API Server "192.168.1.30:6443"
    [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
    [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [preflight] Activating the kubelet service
    [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    

    Though kublet is running:

    [root@k8s-worker1 nodesetup]# systemctl status kubelet -l
    ● kubelet.service - kubelet: The Kubernetes Node Agent
       Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/kubelet.service.d
               └─10-kubeadm.conf
       Active: active (running) since So 2018-09-02 20:31:15 CEST; 19min ago
         Docs: https://kubernetes.io/docs/
     Main PID: 2093 (kubelet)
        Tasks: 7
       Memory: 12.1M
       CGroup: /system.slice/kubelet.service
               └─2093 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni
    
    Sep 02 20:31:15 k8s-worker1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
    Sep 02 20:31:15 k8s-worker1 systemd[1]: Starting kubelet: The Kubernetes Node Agent...
    Sep 02 20:31:15 k8s-worker1 kubelet[2093]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
    Sep 02 20:31:15 k8s-worker1 kubelet[2093]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
    Sep 02 20:31:16 k8s-worker1 kubelet[2093]: I0902 20:31:16.440010    2093 server.go:408] Version: v1.11.2
    Sep 02 20:31:16 k8s-worker1 kubelet[2093]: I0902 20:31:16.440314    2093 plugins.go:97] No cloud provider specified.
    [root@k8s-worker1 nodesetup]# 
    

    As far as I can see, the worker can connect to the master, but it tries to run a healthcheck on some local servlet which has not come up. Any ideas?

    Here is what I did to configure my worker:

    exec bash
    setenforce 0
    sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
    
    
    echo "Setting Firewallrules"
    firewall-cmd --permanent --add-port=10250/tcp
    firewall-cmd --permanent --add-port=10255/tcp
    firewall-cmd --permanent --add-port=30000-32767/tcp
    firewall-cmd --permanent --add-port=6783/tcp
    firewall-cmd --reload
    
    
    echo "And enable br filtering"
    modprobe br_netfilter
    echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
    
    
    echo "disable swap"
    swapoff -a
    echo "### You need to edit /etc/fstab and comment the swapline!! ###"
    
    
    echo "Adding kubernetes repo for download"
    cat <<EOF > /etc/yum.repos.d/kubernetes.repo
    [kubernetes]
    name=Kubernetes
    baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
            https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    EOF
    
    
    echo "install the Docker-ce dependencies"
    yum install -y yum-utils device-mapper-persistent-data lvm2
    
    echo "add docker-ce repository"
    yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
    
    echo "install docker ce"
    yum install -y docker-ce
    
    echo "Install kubeadm kubelet kubectl"
    yum install kubelet kubeadm kubectl -y
    
    echo "start and enable kubectl"
    systemctl restart docker && systemctl enable docker
    systemctl restart kubelet && systemctl enable kubelet
    
    echo "Now we need to ensure that both Docker-ce and Kubernetes belong to the same control group (cgroup)"
    
    echo "We assume that docker is using cgroupfs ... assuming kubelet does so too"
    docker info | grep -i cgroup
    grep -i cgroup /var/lib/kubelet/kubeadm-flags.env
    #  old style
    # sed -i 's/cgroup-driver=systemd/cgroup-driver=cgroupfs/g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
    
    systemctl daemon-reload
    systemctl restart kubelet
    
    # There has been an issue reported that traffic in iptable is been routed incorrectly.
    # Below settings will make sure IPTable is configured correctly.
    #
    sudo bash -c 'cat <<EOF >  /etc/sysctl.d/k8s.conf
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    EOF'
    
    # Make changes effective
    sudo sysctl --system
    

    Thanks for any help in advance.

    Update I

    Journalctl Output from the worker:

    [root@k8s-worker1 ~]# journalctl -xeu kubelet
    Sep 02 21:19:56 k8s-worker1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
    -- Subject: Unit kubelet.service has finished start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    -- 
    -- Unit kubelet.service has finished starting up.
    -- 
    -- The start-up result is done.
    Sep 02 21:19:56 k8s-worker1 systemd[1]: Starting kubelet: The Kubernetes Node Agent...
    -- Subject: Unit kubelet.service has begun start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    -- 
    -- Unit kubelet.service has begun starting up.
    Sep 02 21:19:56 k8s-worker1 kubelet[3082]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --confi
    Sep 02 21:19:56 k8s-worker1 kubelet[3082]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --confi
    Sep 02 21:19:56 k8s-worker1 kubelet[3082]: I0902 21:19:56.788059    3082 server.go:408] Version: v1.11.2
    Sep 02 21:19:56 k8s-worker1 kubelet[3082]: I0902 21:19:56.788214    3082 plugins.go:97] No cloud provider specified.
    Sep 02 21:19:56 k8s-worker1 kubelet[3082]: F0902 21:19:56.814469    3082 server.go:262] failed to run Kubelet: cannot create certificate signing request: Unauthorized
    Sep 02 21:19:56 k8s-worker1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
    Sep 02 21:19:56 k8s-worker1 systemd[1]: Unit kubelet.service entered failed state.
    Sep 02 21:19:56 k8s-worker1 systemd[1]: kubelet.service failed.
    

    And the get pods on the master side results in:

    [root@k8s-master ~]# kubectl get pods --all-namespaces=true
    NAMESPACE     NAME                                 READY     STATUS    RESTARTS   AGE
    kube-system   coredns-78fcdf6894-79n2m             0/1       Pending   0          1d
    kube-system   coredns-78fcdf6894-tlngr             0/1       Pending   0          1d
    kube-system   etcd-k8s-master                      1/1       Running   3          1d
    kube-system   kube-apiserver-k8s-master            1/1       Running   0          1d
    kube-system   kube-controller-manager-k8s-master   0/1       Evicted   0          1d
    kube-system   kube-proxy-2x8cx                     1/1       Running   3          1d
    kube-system   kube-scheduler-k8s-master            1/1       Running   0          1d
    [root@k8s-master ~]# 
    

    Update II As next step I generated a new token on the master side and used this one on the join command. Though the master token list displayed the the token as a valid one, the worker node insist that the master does not know about this token or it is expired....stop! Time to start all over, beginning with the master setup.

    So here's what I did:

    1) resetup the master VM, meaning a fresh centos7 (CentOS-7-x86_64-Minimal-1804.iso) installation on virtualbox. Configured networking von virtualbox: adapter1 as NAT to the host system (for being able to install the stuff) and adapter2 as internal network (same name to master and worker nodes for the kubernetes network).

    2) With the fresh image installed the basis interface enp0s3 was not configured to run at boot time (so ifup enp03s, and reconfigured in /etc/sysconfig/network-script to run at boot time).

    3) Configuring the second interface for the internal kubernetes network:

    /etc/hosts:

    #!/bin/sh
    echo '192.168.1.30 k8s-master' >> /etc/hosts
    echo '192.168.1.40 k8s-worker1' >> /etc/hosts
    echo '192.168.1.50 k8s-worker2' >> /etc/hosts
    

    Identified my second interface via "ip -color -human addr" which showed me the enp0S8 in my case, so:

    #!/bin/sh
    echo "Setting up internal Interface"
    cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-enp0s8
    DEVICE=enp0s8
    IPADDR=192.168.1.30
    NETMASK=255.255.255.0
    NETWORK=192.168.1.0
    BROADCAST=192.168.1.255
    ONBOOT=yes
    NAME=enp0s8
    EOF
    
    echo "Activate interface"
    ifup enp0s8
    

    4) Hostname, swap, disabling SELinux

    #!/bin/sh
    echo "Setting hostname und deactivate SELinux"
    hostnamectl set-hostname 'k8s-master'
    exec bash
    setenforce 0
    sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
    
    echo "disable swap"
    swapoff -a
    
    echo "### You need to edit /etc/fstab and comment the swapline!! ###"
    

    Some remarks here: I rebooted as I saw that the later preflight checks seems to parse /etc/fstab to see that the swap does not exists. Also it seems that centos reactivates SElinux (I need to check this later on) as workaround I disabled it again after each reboot.

    5) Establish the requires firewall settings

    #!/bin/sh
    echo "Setting Firewallrules"
    firewall-cmd --permanent --add-port=6443/tcp
    firewall-cmd --permanent --add-port=2379-2380/tcp
    firewall-cmd --permanent --add-port=10250/tcp
    firewall-cmd --permanent --add-port=10251/tcp
    firewall-cmd --permanent --add-port=10252/tcp
    firewall-cmd --permanent --add-port=10255/tcp
    firewall-cmd --reload
    
    echo "And enable br filtering"
    modprobe br_netfilter
    echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
    

    6) Adding the kubernetes repository

    #!/bin/sh
    echo "Adding kubernetes repo for download"
    cat <<EOF > /etc/yum.repos.d/kubernetes.repo
    [kubernetes]
    name=Kubernetes
    baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
            https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    EOF
    

    7) Install the required packages and configure the services

    #!/bin/sh
    
    echo "install the Docker-ce dependencies"
    yum install -y yum-utils device-mapper-persistent-data lvm2
    
    echo "add docker-ce repository"
    yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
    
    echo "install docker ce"
    yum install -y docker-ce
    
    echo "Install kubeadm kubelet kubectl"
    yum install kubelet kubeadm kubectl -y
    
    echo "start and enable kubectl"
    systemctl restart docker && systemctl enable docker
    systemctl restart kubelet && systemctl enable kubelet
    
    echo "Now we need to ensure that both Docker-ce and Kubernetes belong to the same control group (cgroup)"
    echo "We assume that docker is using cgroupfs ... assuming kubelet does so too"
    docker info | grep -i cgroup
    grep -i cgroup /var/lib/kubelet/kubeadm-flags.env
    #  old style
    # sed -i 's/cgroup-driver=systemd/cgroup-driver=cgroupfs/g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
    
    systemctl daemon-reload
    systemctl restart kubelet
    
    # There has been an issue reported that traffic in iptable is been routed incorrectly.
    # Below settings will make sure IPTable is configured correctly.
    #
    sudo bash -c 'cat <<EOF >  /etc/sysctl.d/k8s.conf
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    EOF'
    
    # Make changes effective
    sudo sysctl --system
    

    8) Init the cluster

    #!/bin/sh
    echo "Init kubernetes. Check join cmd in initProtocol.txt"
    kubeadm init --apiserver-advertise-address=192.168.1.30 --pod-network-cidr=192.168.1.0/16 | tee initProtocol.txt
    

    To verify here is the result of this command:

    Init kubernetes. Check join cmd in initProtocol.txt
    [init] using Kubernetes version: v1.11.2
    [preflight] running pre-flight checks
            [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
    I0904 21:53:15.271999    1526 kernel_validator.go:81] Validating kernel version
    I0904 21:53:15.272165    1526 kernel_validator.go:96] Validating kernel config
            [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.06.1-ce. Max validated version: 17.03
    [preflight/images] Pulling images required for setting up a Kubernetes cluster
    [preflight/images] This might take a minute or two, depending on the speed of your internet connection
    [preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
    [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [preflight] Activating the kubelet service
    [certificates] Generated ca certificate and key.
    [certificates] Generated apiserver certificate and key.
    [certificates] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.30]
    [certificates] Generated apiserver-kubelet-client certificate and key.
    [certificates] Generated sa key and public key.
    [certificates] Generated front-proxy-ca certificate and key.
    [certificates] Generated front-proxy-client certificate and key.
    [certificates] Generated etcd/ca certificate and key.
    [certificates] Generated etcd/server certificate and key.
    [certificates] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [127.0.0.1 ::1]
    [certificates] Generated etcd/peer certificate and key.
    [certificates] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.1.30 127.0.0.1 ::1]
    [certificates] Generated etcd/healthcheck-client certificate and key.
    [certificates] Generated apiserver-etcd-client certificate and key.
    [certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
    [controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
    [controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
    [controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
    [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
    [init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" 
    [init] this might take a minute or longer if the control plane images have to be pulled
    [apiclient] All control plane components are healthy after 43.504792 seconds
    [uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    [kubelet] Creating a ConfigMap "kubelet-config-1.11" in namespace kube-system with the configuration for the kubelets in the cluster
    [markmaster] Marking the node k8s-master as master by adding the label "node-role.kubernetes.io/master=''"
    [markmaster] Marking the node k8s-master as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
    [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-master" as an annotation
    [bootstraptoken] using token: n4yt3r.3c8tuj11nwszts2d
    [bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
    [bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
    [bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
    [bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
    [addons] Applied essential addon: CoreDNS
    [addons] Applied essential addon: kube-proxy
    
    Your Kubernetes master has initialized successfully!
    
    To start using your cluster, you need to run the following as a regular user:
    
      mkdir -p $HOME/.kube
      sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
      sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    You should now deploy a pod network to the cluster.
    Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
      https://kubernetes.io/docs/concepts/cluster-administration/addons/
    
    You can now join any number of machines by running the following on each node
    as root:
    
      kubeadm join 192.168.1.30:6443 --token n4yt3r.3c8tuj11nwszts2d --discovery-token-ca-cert-hash sha256:466e7972a4b6997651ac1197fdde68d325a7bc41f2fccc2b1efc17515af61172
    

    Remark: looks fine for me so far, though I'm a bit worried that the latest docker-ce version might cause troubles here...

    9) Deploying the pod network

    #!/bin/bash
    
    echo "Configure demo cluster usage as root"
    mkdir -p $HOME/.kube
    cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    chown $(id -u):$(id -g) $HOME/.kube/config
    
    # Deploy-Network using flanel
    # Taken from first matching two tutorials on the web
    # kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    # kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
    
    # taken from https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network
    kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
    kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/c5d10c8/Documentation/kube-flannel.yml
    
    echo "Try to run kubectl get pods --all-namespaces"
    echo "After joining nodes: try to run kubectl get nodes to verify the status"
    

    And here's the output of this command:

    Configure demo cluster usage as root
    clusterrole.rbac.authorization.k8s.io/flannel created
    clusterrolebinding.rbac.authorization.k8s.io/flannel created
    serviceaccount/flannel created
    configmap/kube-flannel-cfg created
    daemonset.extensions/kube-flannel-ds created
    clusterrole.rbac.authorization.k8s.io/flannel configured
    clusterrolebinding.rbac.authorization.k8s.io/flannel configured
    serviceaccount/flannel unchanged
    configmap/kube-flannel-cfg unchanged
    daemonset.extensions/kube-flannel-ds-amd64 created
    daemonset.extensions/kube-flannel-ds-arm64 created
    daemonset.extensions/kube-flannel-ds-arm created
    daemonset.extensions/kube-flannel-ds-ppc64le created
    daemonset.extensions/kube-flannel-ds-s390x created
    Try to run kubectl get pods --all-namespaces
    After joining nodes: try to run kubectl get nodes to verify the status
    

    So I tried kubectl get pods --all-namespaces and I get

    [root@k8s-master nodesetup]# kubectl get pods --all-namespaces
    NAMESPACE     NAME                                 READY     STATUS    RESTARTS   AGE
    kube-system   coredns-78fcdf6894-pflhc             0/1       Pending   0          33m
    kube-system   coredns-78fcdf6894-w7dxg             0/1       Pending   0          33m
    kube-system   etcd-k8s-master                      1/1       Running   0          27m
    kube-system   kube-apiserver-k8s-master            1/1       Running   0          27m
    kube-system   kube-controller-manager-k8s-master   0/1       Evicted   0          27m
    kube-system   kube-proxy-stfxm                     1/1       Running   0          28m
    kube-system   kube-scheduler-k8s-master            1/1       Running   0          27m
    

    and

    [root@k8s-master nodesetup]# kubectl get nodes
    NAME         STATUS     ROLES     AGE       VERSION
    k8s-master   NotReady   master    35m       v1.11.2
    

    Hm...what's wrong with my master?

    Some observations:

    Sometime I got connection refused on running the kubectl in the beginning, I found out that it takes some minutes before the service is established. But because of this I was looking in the /var/log/firewalld and found a lot of these:

    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -D PREROUTING' failed: iptables: Bad rule (does a matching rule exist in that chain?).
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -D OUTPUT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -F DOCKER' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -X DOCKER' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -F DOCKER' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -X DOCKER' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -F DOCKER-ISOLATION-STAGE-1' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -X DOCKER-ISOLATION-STAGE-1' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -F DOCKER-ISOLATION-STAGE-2' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -X DOCKER-ISOLATION-STAGE-2' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -F DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -X DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -n -L DOCKER' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -n -L DOCKER' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -n -L DOCKER-ISOLATION-STAGE-1' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -n -L DOCKER-ISOLATION-STAGE-2' failed: iptables: No chain/target/match by that name.
    
    2018-09-04 21:52:09 WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C DOCKER-ISOLATION-STAGE-1 -j RETURN' failed: iptables: Bad rule (does a matching rule exist in that chain?).
    

    Wrong docker version? The docker installation setup seems to be broken.

    Anything else I can check on the master side... It's gonna be late - tomorrow I'm trying to join my worker again (within the 24h range of the initial token period).

    Update III (After solving the docker issue)

    [root@k8s-master ~]# kubectl get pods --all-namespaces=true
    NAMESPACE     NAME                                 READY     STATUS    RESTARTS   AGE
    kube-system   coredns-78fcdf6894-pflhc             0/1       Pending   0          10h
    kube-system   coredns-78fcdf6894-w7dxg             0/1       Pending   0          10h
    kube-system   etcd-k8s-master                      1/1       Running   0          10h
    kube-system   kube-apiserver-k8s-master            1/1       Running   0          10h
    kube-system   kube-controller-manager-k8s-master   1/1       Running   0          10h
    kube-system   kube-flannel-ds-amd64-crljm          0/1       Pending   0          1s
    kube-system   kube-flannel-ds-v6gcx                0/1       Pending   0          0s
    kube-system   kube-proxy-l2dck                     0/1       Pending   0          0s
    kube-system   kube-scheduler-k8s-master            1/1       Running   0          10h
    [root@k8s-master ~]# 
    

    And master looks happy now

    [root@k8s-master ~]# kubectl get nodes
    NAME         STATUS    ROLES     AGE       VERSION
    k8s-master   Ready     master    10h       v1.11.2
    [root@k8s-master ~]# 
    

    Stay tuned...after work I'm fixing docker/firewall on the worker, too and will try to join the cluster again (knowing now how to issue a new token if required). So Update IV will follow in about 10hours

  • StefanSchubert
    StefanSchubert over 5 years
    OK - Token expired...I should have seen this. Anyway I will start from scratch here in a few hours. I will also append the steps then I have used to setup the master and provide the "get nodes" "get pods" then again...
  • StefanSchubert
    StefanSchubert over 5 years
    Added Update II section - as the master was looking unhealthy I logged the complete master installation procedure, maybe you can identify something obviously I have been missing. Sadly master still looks unhealthy.
  • StefanSchubert
    StefanSchubert over 5 years
    About the docker firewall problem I found the following hint: sanenthusiast.com/docker-and-firewalld-mess-in-centos-7 So I choose to disable firewalld, once the rules have been setup.
  • StefanSchubert
    StefanSchubert over 5 years
    And indeed iptables -L shows the requires chains after reboot. And while this was the only thing changed so far, I can see suddenly the flannel network (See update III above)
  • StefanSchubert
    StefanSchubert over 5 years
    joining the worker did not worked. I observed that after rebooting the server I got a healthy state (as shown in update III) but then after trying to join the client i get The connection to the server 192.168.1.30:6443 was refused - did you specify the right host or port? Hm it seems that the server crashes after a couple of minutes.
  • StefanSchubert
    StefanSchubert over 5 years
    I tested a little around, but kubernetes on the master still crashes after a while. And I get the connection refused.... Anyway I found one mistake in my setup with the flannel, the kubeadm init requires: --pod-network-cidr=10.244.0.0/16 for some reason I thought this must be based on my private network - but this is just for the pod internal network and not for the vm network. So I fixed this. Unfortunately the master still crashes after a couple of minutes. @mk_sta do you know which log to look for in that case?
  • Nick_Kh
    Nick_Kh over 5 years
    As an experiment try to disable firewall, flannel can use own ports, more info here. Logs per runtime cluster components you can check within a command: kubectl logs --namespace=kube-system Pod_name.
  • StefanSchubert
    StefanSchubert over 5 years
    Huh - this seems to be a challenge with centos7. After disabling firewalld, iptables is still intact (and the chains survives the reboot). However there is no way to "systemctl stop iptables" as the service scripts are not available, also I found that there is something like ebtables running on top. Google pretty much on this, with no clear result. Rather more confusing, some suggest to "yum install iptables" (which of course creates the service scripts - but i'm not dare it, as I feard that the docker chains might be dropped by it), but iptables already active.
  • StefanSchubert
    StefanSchubert over 5 years
    Next thing I'm going to try is a downgrade of kubernetes. All the kubernetes/centos7 article I found via google were relating on kubernetes 1.7 however yum installs fetches the current version which ist 1.11.2 I guess that this version is buggy.