zookeeper does not run?

11,939

Solution 1

There must have been some sort of connectivity problem. I see you have it resolved now. Next time you run into a situation like this, you should log onto the box that is having problems connecting and use telnet to see if you can connect.

eg: from your solr box:

telnet ec2-54-247-144-120.eu-west-1.compute.amazonaws.com 2181

and then try from the zk box too. It should start to illuminate where your issues are.

That eliminates any application layer issues and will tell you quite reliably wether or not you can connect. It you can't connect, then it's almost always some sort of security issue - either a firewall running somewhere (try - $service iptables stop) or it will be an issue with security group configuration in amazon.

The last potential problem is network availability. Despite what people think, the network is NOT reliable and should never be considered so. Anyone working in SOA/distributed systems will know this well :) http://aphyr.com/posts/288-the-network-is-reliable

"A team from the University of Toronto and Microsoft Research studied the behavior of network failures in several of Microsoft’s datacenters. They found an average failure rate of 5.2 devices per day and 40.8 links per day with a median time to repair of approximately five minutes (and up to one week). "

Solution 2

You have your answer - Your ZooKeeper in inaccessible! Check your firewall configuration.

You can also check it with

zkCli.sh -server localhost:2181
Share:
11,939
andre
Author by

andre

Updated on June 04, 2022

Comments

  • andre
    andre almost 2 years

    I wanted to run a solr cloud with solr 4.3.0.

    (I am using aws ubuntu-12.04-lts micro instances)

    So I followed this toturial:

    which basically says, start the zookeeper and connect the solr instances to it.

    Here's how I start the zookeeper.

    • First I copied the config like described in the tutorial

      sudo cp zookeeper-3.4.5/conf/zoo_sample.cfg zookeeper-3.4.5/conf/zoo.cfg
      
    • Then I started the zookeeper

      ubuntu@ip-10-48-159-36:/opt$ sudo zookeeper-3.4.5/bin/zkServer.sh start
      JMX enabled by default
      Using config: /opt/zookeeper-3.4.5/bin/../conf/zoo.cfg
      Starting zookeeper ... STARTED
      

      Looks fine so far.

    • I checked the status:

      ubuntu@ip-10-48-159-36:/opt$ sudo zookeeper-3.4.5/bin/zkServer.sh status
      JMX enabled by default
      Using config: /opt/zookeeper-3.4.5/bin/../conf/zoo.cfg
      Error contacting service. It is probably not running.
      

      Which seems a bit weird already.

    • If I try to connect with the client (remote as well as local), its seems to work

      ubuntu@ip-10-234-223-69:/opt$ zookeeper-3.4.5/bin/zkCli.sh -server ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181
      Connecting to ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181
      2013-06-07 11:07:01,996 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
      2013-06-07 11:07:02,000 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=ip-10-234-223-69.eu-west-1.compute.internal
      2013-06-07 11:07:02,000 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.6.0_27
      2013-06-07 11:07:02,002 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Sun Microsystems Inc.
      2013-06-07 11:07:02,003 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
      2013-06-07 11:07:02,003 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper-3.4.5/bin/../build/classes:/opt/zookeeper-3.4.5/bin/../build/lib/*.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/opt/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/opt/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/opt/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.5/bin/../conf:
      2013-06-07 11:07:02,004 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64:/usr/lib/jvm/java-6-openjdk-amd64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
      2013-06-07 11:07:02,008 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
      2013-06-07 11:07:02,009 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
      2013-06-07 11:07:02,018 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
      2013-06-07 11:07:02,019 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
      2013-06-07 11:07:02,019 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.2.0-40-virtual
      2013-06-07 11:07:02,020 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=ubuntu
      2013-06-07 11:07:02,020 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/ubuntu
      2013-06-07 11:07:02,021 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/opt
      2013-06-07 11:07:02,029 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@182d9c06
      Welcome to ZooKeeper!
      2013-06-07 11:07:02,074 [myid:] - INFO  [main-SendThread(ip-10-48-159-36.eu-west-1.compute.internal:2181):ClientCnxn$SendThread@966] - Opening socket connection to server ip-10-48-159-36.eu-west-1.compute.internal/10.48.159.36:2181. Will not attempt to authenticate using SASL (unknown error)
      JLine support is enabled
      [zk: ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181(CONNECTING) 0] 2013-06-07 11:07:32,100 [myid:] - INFO  [main-SendThread(ip-10-48-159-36.eu-west-1.compute.internal:2181):ClientCnxn$SendThread@1083] - Client session timed out, have not heard from server in 30038ms for sessionid 0x0, closing socket connection and attempting reconnect
      2013-06-07 11:07:33,204 [myid:] - INFO  [main-SendThread(ip-10-48-159-36.eu-west-1.compute.internal:2181):ClientCnxn$SendThread@966] - Opening socket connection to server ip-10-48-159-36.eu-west-1.compute.internal/10.48.159.36:2181. Will not attempt to authenticate using SASL (unknown error)
      
    • Now I tried to connect a solr instance to it. In the web interface of tomcat7 it only tells me "503 - Server is shutting down", so I checked the solr logs

      2013-06-07 11:16:36,065 [pool-2-thread-1] INFO  org.apache.solr.servlet.SolrDispatchFilter . SolrDispatchFilter.init()
      2013-06-07 11:16:36,100 [pool-2-thread-1] INFO  org.apache.solr.core.SolrResourceLoader . Using JNDI solr.home: /opt/solr-4.3.0/example/solr
      2013-06-07 11:16:36,132 [pool-2-thread-1] INFO  org.apache.solr.core.CoreContainer . looking for solr config file: /opt/solr-4.3.0/example/solr/solr.xml
      2013-06-07 11:16:36,138 [pool-2-thread-1] INFO  org.apache.solr.core.CoreContainer . New CoreContainer 1285984216
      2013-06-07 11:16:36,146 [pool-2-thread-1] INFO  org.apache.solr.core.CoreContainer . Loading CoreContainer using Solr Home: '/opt/solr-4.3.0/example/solr/'
      2013-06-07 11:16:36,152 [pool-2-thread-1] INFO  org.apache.solr.core.SolrResourceLoader . new SolrResourceLoader for directory: '/opt/solr-4.3.0/example/solr/'
      2013-06-07 11:16:36,567 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting socketTimeout to: 0
      2013-06-07 11:16:36,568 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting urlScheme to: http://
      2013-06-07 11:16:36,568 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting connTimeout to: 0
      2013-06-07 11:16:36,568 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting maxConnectionsPerHost to: 20
      2013-06-07 11:16:36,568 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting corePoolSize to: 0
      2013-06-07 11:16:36,568 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting maximumPoolSize to: 2147483647
      2013-06-07 11:16:36,568 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting maxThreadIdleTime to: 5
      2013-06-07 11:16:36,569 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting sizeOfQueue to: -1
      2013-06-07 11:16:36,569 [pool-2-thread-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory . Setting fairnessPolicy to: false
      2013-06-07 11:16:36,578 [pool-2-thread-1] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil . Creating new http client, config:maxConnectionsPerHost=20&maxConnections=10000&socketTimeout=0&connTimeout=0&retry=false
      2013-06-07 11:16:36,879 [pool-2-thread-1] INFO  org.apache.solr.core.CoreContainer . Registering Log Listener
      2013-06-07 11:16:36,881 [pool-2-thread-1] INFO  org.apache.solr.core.CoreContainer . Zookeeper client=ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181
      2013-06-07 11:16:36,888 [pool-2-thread-1] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil . Creating new http client, config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0
      2013-06-07 11:16:37,040 [pool-2-thread-1] INFO  org.apache.solr.common.cloud.ConnectionManager . Waiting for client to connect to ZooKeeper
      2013-06-07 11:16:52,046 [pool-2-thread-1] ERROR org.apache.solr.servlet.SolrDispatchFilter . Could not start Solr. Check solr/home property and the logs
      2013-06-07 11:16:52,103 [pool-2-thread-1] ERROR org.apache.solr.core.SolrCore . null:java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 within 15000 ms
          at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:130)
          at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:88)
          at org.apache.solr.cloud.ZkController.<init>(ZkController.java:170)
          at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:242)
          at org.apache.solr.core.CoreContainer.load(CoreContainer.java:495)
          at org.apache.solr.core.CoreContainer.load(CoreContainer.java:358)
          at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:326)
          at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:124)
          at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
          at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
          at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
          at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:103)
          at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
          at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
          at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
          at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
          at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
          at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
          at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
          at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
          at java.util.concurrent.FutureTask.run(FutureTask.java:166)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:679)
      Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper ec2-54-247-144-120.eu-west-1.compute.amazonaws.com:2181 within 15000 ms
          at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173)
          at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:127)
          ... 25 more
      
      2013-06-07 11:16:52,104 [pool-2-thread-1] INFO  org.apache.solr.servlet.SolrDispatchFilter . SolrDispatchFilter.init() done
      

    What does it tell me? On the same instance I just connected with the client successfully... :(

    So where is the problem?

    [Edit:] Instead of using amazons ec**.amazon.* address I used the network addresses 10.X.X.X for telling solr where the zookeeper is. It seems to work.

  • andre
    andre almost 11 years
    No firewall problems in the aws network. Just restarted everything and it looks fine now. Thanks anyway.