Zookeeper: Hostname resolution fails

12,293

I found a working solution for this issue. ZooKeeper reads the list of servers in the ensemble on startup and looks for its "own" entry. It then uses this entry to determine which port and interface to listen on.

server.1=zookeeper-0.zookeeper-headless:2888:3888
server.2=zookeeper-1.zookeeper-headless:2888:3888
server.3=zookeeper-2.zookeeper-headless:2888:3888

Since the provided hostname will resolve to 127.0.0.1 on this machine, ZooKeeper will listen on the local loopback interface and therefore does not accept connections from the other ZooKeeper servers.

server.1=0.0.0.0:2888:3888
server.2=zookeeper-1.zookeeper-headless:2888:3888
server.3=zookeeper-2.zookeeper-headless:2888:3888

To automate things in the cluster, I wrote a bash script that will replace the one "own" entry on container startup.

EDIT: As asked in the comments, here is my ENTRYPOINT script that takes care of placing the myid file and setting the appropriate hostname for each zoo.cfg:

#!/bin/bash
# This script extracts the number out of the pod's hostname and sets it as zookeepers id.

# Exact paths may vary according to your setup
MYID_FILE="/var/lib/zookeeper/data/myid"
ZOOCFG_FILE="/conf/zoo.cfg"

# Create myid-file
# Extract only numbers from hostname
id=$(hostname | tr -d -c 0-9)
echo $id > "${MYID_FILE}"

# change own hostname to 0.0.0.0
# otherwise, the own hostname will resolve to 127.0.0.1
# https://stackoverflow.com/a/40750900/5764665
fullHostname="$(hostname).zookeeper-headless"
sed -i -e "s/${fullHostname}/0.0.0.0/g" "${ZOOCFG_FILE}"

echo "Executing $@"
exec "$@"
Share:
12,293
Franz Wimmer
Author by

Franz Wimmer

Student, C# developer and code enthusiast. Blog: https://codefoundry.de/blog.html

Updated on June 25, 2022

Comments

  • Franz Wimmer
    Franz Wimmer almost 2 years

    I am running Zookeeper in an OpenShift/Kubernetes environment. I have setup zookeeper as a StatefulSet in order to reliably persist config data.

    I configured three servers in my zoo.cfg by hostname, but on startup, hostname resolution fails. I verified hostnames are indeed resolvable using nslookup inside my cluster.

    zoo.cfg:

    clientPort=2181
    dataDir=/var/lib/zookeeper/data
    dataLogDir=/var/lib/zookeeper/log
    tickTime=2000
    initLimit=10
    syncLimit=2000
    maxClientCnxns=60
    minSessionTimeout= 4000
    maxSessionTimeout= 40000
    autopurge.snapRetainCount=3
    autopurge.purgeInteval=0
    server.1=zookeeper-0.zookeeper-headless:2888:3888
    server.2=zookeeper-1.zookeeper-headless:2888:3888
    server.3=zookeeper-2.zookeeper-headless:2888:3888
    

    Relevant parts of my OpenShift / Kubernetes configuration:

      # StatefulSet
      - apiVersion: apps/v1beta1
        kind: StatefulSet
        metadata:
          labels:
            app: zookeeper
          name: zookeeper
        spec:
          serviceName: zookeeper-headless
          replicas: 3
          template:
            metadata:
              labels:
                app: zookeeper
            spec:
              containers:
                - image: 172.30.158.156:5000/os-cloud-platform/zookeeper:latest
                  name: zookeeper
                  ports:
                    - containerPort: 2181
                      protocol: TCP
                      name: client
                    - containerPort: 2888
                      protocol: TCP
                      name: server
                    - containerPort: 3888
                      protocol: TCP
                      name: leader-election
              dnsPolicy: ClusterFirst
              schedulerName: default-scheduler
    
      # Service
      - apiVersion: v1
        kind: Service
        metadata:
          labels:
            app: zookeeper
          name: zookeeper
        spec:
          ports:
            - name: client
              port: 2181
              protocol: TCP
              targetPort: 2181
          selector:
            app: zookeeper
          sessionAffinity: None
          type: ClusterIP
    
      - apiVersion: v1
        kind: Service
        metadata:
          name: zookeeper-headless
          labels:
            app: zookeeper
        spec:
          ports:
            - port: 2888
              name: server
            - port: 3888
              name: leader-election
          clusterIP: None
          selector:
            app: zookeeper
    

    OpenShift logs show UnknownHostExceptions, though:

    2017-10-06 10:59:18,289 [myid:] - WARN  [main:QuorumPeer$QuorumServer@155] - Failed to resolve address: zookeeper-2.zookeeper-headless
    java.net.UnknownHostException: zookeeper-2.zookeeper-headless: No address associated with hostname
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
        at java.net.InetAddress.getAllByName(InetAddress.java:1192)
        at java.net.InetAddress.getAllByName(InetAddress.java:1126)
        at java.net.InetAddress.getByName(InetAddress.java:1076)
        at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:148)
        at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.<init>(QuorumPeer.java:133)
        at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:228)
        at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:140)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    ...
    

    What could be the cause? I verified that the hostname (e.g. zookeeper-2.zookeeper-headless) is available from other pods through nslookup.