zookeeper does not run?
Solution 1
There must have been some sort of connectivity problem. I see you have it resolved now. Next time you run into a situation like this, you should log onto the box that is having problems connecting and use telnet to see if you can connect.
eg: from your solr box:
telnet ec2-54-247-144-120.eu-west-1.compute.amazonaws.com 2181
and then try from the zk box too. It should start to illuminate where your issues are.
That eliminates any application layer issues and will tell you quite reliably wether or not you can connect. It you can't connect, then it's almost always some sort of security issue - either a firewall running somewhere (try - $service iptables stop) or it will be an issue with security group configuration in amazon.
The last potential problem is network availability. Despite what people think, the network is NOT reliable and should never be considered so. Anyone working in SOA/distributed systems will know this well :) http://aphyr.com/posts/288-the-network-is-reliable
"A team from the University of Toronto and Microsoft Research studied the behavior of network failures in several of Microsoft’s datacenters. They found an average failure rate of 5.2 devices per day and 40.8 links per day with a median time to repair of approximately five minutes (and up to one week). "
Solution 2
You have your answer - Your ZooKeeper in inaccessible! Check your firewall configuration.
You can also check it with
zkCli.sh -server localhost:2181
andre
Updated on June 04, 2022Comments
-
andre almost 2 years
I wanted to run a solr cloud with solr 4.3.0.
(I am using aws ubuntu-12.04-lts micro instances)
So I followed this toturial:
which basically says, start the zookeeper and connect the solr instances to it.
Here's how I start the zookeeper.
-
First I copied the config like described in the tutorial
sudo cp zookeeper-3.4.5/conf/zoo_sample.cfg zookeeper-3.4.5/conf/zoo.cfg
-
Then I started the zookeeper
ubuntu@ip-10-48-159-36:/opt$ sudo zookeeper-3.4.5/bin/zkServer.sh start JMX enabled by default Using config: /opt/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
Looks fine so far.
-
I checked the status:
ubuntu@ip-10-48-159-36:/opt$ sudo zookeeper-3.4.5/bin/zkServer.sh status JMX enabled by default Using config: /opt/zookeeper-3.4.5/bin/../conf/zoo.cfg Error contacting service. It is probably not running.
Which seems a bit weird already.
-
If I try to connect with the client (remote as well as local), its seems to work
ubuntu@ip-10-234-223-69:/opt$ zookeeper-3.4.5/bin/zkCli.sh -server ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 Connecting to ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 2013-06-07 11:07:01,996 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2013-06-07 11:07:02,000 [myid:] - INFO [main:Environment@100] - Client environment:host.name=ip-10-234-223-69.eu-west-1.compute.internal 2013-06-07 11:07:02,000 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.6.0_27 2013-06-07 11:07:02,002 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Sun Microsystems Inc. 2013-06-07 11:07:02,003 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre 2013-06-07 11:07:02,003 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper-3.4.5/bin/../build/classes:/opt/zookeeper-3.4.5/bin/../build/lib/*.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/opt/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/opt/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/opt/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.5/bin/../conf: 2013-06-07 11:07:02,004 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64:/usr/lib/jvm/java-6-openjdk-amd64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib 2013-06-07 11:07:02,008 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2013-06-07 11:07:02,009 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2013-06-07 11:07:02,018 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2013-06-07 11:07:02,019 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2013-06-07 11:07:02,019 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.2.0-40-virtual 2013-06-07 11:07:02,020 [myid:] - INFO [main:Environment@100] - Client environment:user.name=ubuntu 2013-06-07 11:07:02,020 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/ubuntu 2013-06-07 11:07:02,021 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/opt 2013-06-07 11:07:02,029 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@182d9c06 Welcome to ZooKeeper! 2013-06-07 11:07:02,074 [myid:] - INFO [main-SendThread(ip-10-48-159-36.eu-west-1.compute.internal:2181):ClientCnxn$SendThread@966] - Opening socket connection to server ip-10-48-159-36.eu-west-1.compute.internal/10.48.159.36:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled [zk: ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181(CONNECTING) 0] 2013-06-07 11:07:32,100 [myid:] - INFO [main-SendThread(ip-10-48-159-36.eu-west-1.compute.internal:2181):ClientCnxn$SendThread@1083] - Client session timed out, have not heard from server in 30038ms for sessionid 0x0, closing socket connection and attempting reconnect 2013-06-07 11:07:33,204 [myid:] - INFO [main-SendThread(ip-10-48-159-36.eu-west-1.compute.internal:2181):ClientCnxn$SendThread@966] - Opening socket connection to server ip-10-48-159-36.eu-west-1.compute.internal/10.48.159.36:2181. Will not attempt to authenticate using SASL (unknown error)
-
Now I tried to connect a solr instance to it. In the web interface of tomcat7 it only tells me "503 - Server is shutting down", so I checked the solr logs
2013-06-07 11:16:36,065 [pool-2-thread-1] INFO org.apache.solr.servlet.SolrDispatchFilter . SolrDispatchFilter.init() 2013-06-07 11:16:36,100 [pool-2-thread-1] INFO org.apache.solr.core.SolrResourceLoader . Using JNDI solr.home: /opt/solr-4.3.0/example/solr 2013-06-07 11:16:36,132 [pool-2-thread-1] INFO org.apache.solr.core.CoreContainer . looking for solr config file: /opt/solr-4.3.0/example/solr/solr.xml 2013-06-07 11:16:36,138 [pool-2-thread-1] INFO org.apache.solr.core.CoreContainer . New CoreContainer 1285984216 2013-06-07 11:16:36,146 [pool-2-thread-1] INFO org.apache.solr.core.CoreContainer . Loading CoreContainer using Solr Home: '/opt/solr-4.3.0/example/solr/' 2013-06-07 11:16:36,152 [pool-2-thread-1] INFO org.apache.solr.core.SolrResourceLoader . new SolrResourceLoader for directory: '/opt/solr-4.3.0/example/solr/' 2013-06-07 11:16:36,567 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting socketTimeout to: 0 2013-06-07 11:16:36,568 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting urlScheme to: http:// 2013-06-07 11:16:36,568 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting connTimeout to: 0 2013-06-07 11:16:36,568 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting maxConnectionsPerHost to: 20 2013-06-07 11:16:36,568 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting corePoolSize to: 0 2013-06-07 11:16:36,568 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting maximumPoolSize to: 2147483647 2013-06-07 11:16:36,568 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting maxThreadIdleTime to: 5 2013-06-07 11:16:36,569 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting sizeOfQueue to: -1 2013-06-07 11:16:36,569 [pool-2-thread-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory . Setting fairnessPolicy to: false 2013-06-07 11:16:36,578 [pool-2-thread-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil . Creating new http client, config:maxConnectionsPerHost=20&maxConnections=10000&socketTimeout=0&connTimeout=0&retry=false 2013-06-07 11:16:36,879 [pool-2-thread-1] INFO org.apache.solr.core.CoreContainer . Registering Log Listener 2013-06-07 11:16:36,881 [pool-2-thread-1] INFO org.apache.solr.core.CoreContainer . Zookeeper client=ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 2013-06-07 11:16:36,888 [pool-2-thread-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil . Creating new http client, config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0 2013-06-07 11:16:37,040 [pool-2-thread-1] INFO org.apache.solr.common.cloud.ConnectionManager . Waiting for client to connect to ZooKeeper 2013-06-07 11:16:52,046 [pool-2-thread-1] ERROR org.apache.solr.servlet.SolrDispatchFilter . Could not start Solr. Check solr/home property and the logs 2013-06-07 11:16:52,103 [pool-2-thread-1] ERROR org.apache.solr.core.SolrCore . null:java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 within 15000 ms at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:130) at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:88) at org.apache.solr.cloud.ZkController.<init>(ZkController.java:170) at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:242) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:495) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:358) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:326) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:124) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 within 15000 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173) at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:127) ... 25 more 2013-06-07 11:16:52,104 [pool-2-thread-1] INFO org.apache.solr.servlet.SolrDispatchFilter . SolrDispatchFilter.init() done
What does it tell me? On the same instance I just connected with the client successfully... :(
So where is the problem?
[Edit:] Instead of using amazons ec**.amazon.* address I used the network addresses 10.X.X.X for telling solr where the zookeeper is. It seems to work.
-
-
andre almost 11 yearsNo firewall problems in the aws network. Just restarted everything and it looks fine now. Thanks anyway.