zookeeper server not running
Your election port is being binded at sgX.imatiasl.lan/127.0.0.1:3888
for all nodes, so when the clients try to connect to sgY.imatiasl.lan/10.7.0.93:3888
it fails.
The election ports should bind to 0.0.0.0:3888
or the real IP of each node, but for some reason they are being resolved to 127.0.0.1. You can check the IP:port in each node with netstat -patun
to confirm this.
Much probably you have some issue with /etc/hosts
.
Take a look at: https://unix.stackexchange.com/questions/240506/zookeeper-dns-name-problems-with-leader-elections-when-migrating-from-windows-to
NotGaeL
None of the opinions expressed on the content I post here necessarily reflect the opinions of my employers (or my current ones). The dumb ones are probably sarcastic. https://abstrusegoose.com/249
Updated on June 05, 2022Comments
-
NotGaeL almost 2 years
I'm trying to start an hbase master from ambari.
It can't start it because it can't connect to zookeper server.
Ambari marks all the zookeper servers (3 nodes) as running.
The application server (tomcat¿?) that runs the zookeper server application seems to be running fine; At least there is a service listening on the specified port.
But the application is not able to connect to the other nodes and it seems like it doesn't start.
All the connections are closed with the error message
ZooKeeperServer not running
on zookeeper server log, andzookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket
on the client.This is the zookeper server log output for those nodes (same log for all of them, only the node names change):
2016-03-31 16:15:34,550 - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /usr/hdp/current/zookeeper-server/conf/zoo.cfg 2016-03-31 16:15:34,553 - INFO [main:QuorumPeerConfig@338] - Defaulting to majority quorums 2016-03-31 16:15:34,557 - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 30 2016-03-31 16:15:34,557 - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 24 2016-03-31 16:15:34,558 - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2016-03-31 16:15:34,565 - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. 2016-03-31 16:15:34,566 - INFO [main:QuorumPeerMain@127] - Starting quorum peer 2016-03-31 16:15:34,573 - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181 2016-03-31 16:15:34,582 - INFO [main:QuorumPeer@992] - tickTime set to 2000 2016-03-31 16:15:34,582 - INFO [main:QuorumPeer@1012] - minSessionTimeout set to -1 2016-03-31 16:15:34,582 - INFO [main:QuorumPeer@1023] - maxSessionTimeout set to -1 2016-03-31 16:15:34,582 - INFO [main:QuorumPeer@1038] - initLimit set to 10 2016-03-31 16:15:34,598 - INFO [Thread-2:QuorumCnxManager$Listener@506] - My election bind port: sg1.imatiasl.lan/127.0.0.1:3888 2016-03-31 16:15:34,607 - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@747] - LOOKING 2016-03-31 16:15:34,608 - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id = 1, proposed zxid=0x0 2016-03-31 16:15:34,609 - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 ( n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state) 2016-03-31 16:15:34,612 - WARN [WorkerSender[myid=1]:QuorumCnxManager@383] - Cannot open channel to 2 at election address sg2.imatiasl.lan/10.7.0.93:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2016-03-31 16:15:34,614 - WARN [WorkerSender[myid=1]:QuorumCnxManager@383] - Cannot open channel to 3 at election address sg3.imatiasl.lan/10.7.0.94:3888 java.net.ConnectException: Conexión rehusada at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2016-03-31 16:15:34,812 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 2 at election address sg2.imatiasl.la n/10.7.0.93:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795) 2016-03-31 16:15:34,813 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address sg3.imatiasl.la n/10.7.0.94:3888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:795) 2016-03-31 16:15:34,813 - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 400
When the client tries to connect:
2016-03-31 16:15:35,086 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.7.0.93:55914 2016-03-31 16:15:35,130 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOExcep tion: ZooKeeperServer not running 2016-03-31 16:15:35,130 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.7.0.93:55914 (no ses sion established for client)
And so on...
Any ideas on how to fix this?
-
NotGaeL about 8 yearsall of my nodes are refusing connections. This one cannot connect to the other two, and those two cannot connect to this one or each other. It happens when I try to start zookeeper. Ambari says it's starting right but, as you can see on the logs, it's not. Do you know why?
-
NotGaeL about 8 yearsI ran
jps -l
as the userzookeeper
and got612 org.apache.zookeeper.server.quorum.QuorumPeerMain
, so I guess it is. What can I do now? -
NotGaeL about 8 yearsThank you! I was binding sgX to 127.0.0.1 in /etc/hosts of each sgX (which incidentally I did to solve some problem during the cluster setup I don't even remember), and that was the problem.
-
NotGaeL about 8 yearsNow I can start zookeeper and hbase RegionServer without problems, but HBase master is stil resisting. I get
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 58: ordinal not in range(128)
onchecked_call['curl -sS -L -w '%{http_code}' -X GET 'http://sg1.imatiasl.lan:50070/webhdfs/v1/apps/hbase/data?op=GETFILESTATUS&user.name=hdfs''] {'logoutput': None, 'user': 'hdfs', 'stderr': -1, 'quiet': False}
. Do you know how I can solve this? -
NotGaeL about 8 years(posted another question here: stackoverflow.com/questions/36409105/… )
-
15412s about 8 yearscan you reboot? or kill the processes? I think you wished to run just one service of zookeeper, do you run any script to init ZK that maybe has some bad loop inside?
-
Alfonso Nishikawa about 8 yearsI don't use Ambari and I don't know python, but that seems to be related to 'ñ' and tildes. Try removing them wherever they are, or use unicode u'..' strings. I can't deduce much more :( salvatorelab.es/2013/12/…
-
quazardous almost 6 yearsas NotGael said, be aware that if you have bind some hostname to ip like 127.0.1.1 in /etc/hosts, even with multiple servers, Zookeeper will fail establishing server 2 server connexion