Is there a way to add nodes to a running Hadoop cluster?

27,138

Solution 1

Here is the documentation for adding a node to Hadoop and for HBase. Looking at the documentation, there is no need to restart the cluster. A node can be added dynamically.

Solution 2

Following steps should help you launch the new node into the running cluster.

1> Update the /etc/hadoop/conf/slaves list with the new node-name
2> Sync the full configuration /etc/hadoop/conf to the new datanode from the Namenode. If the file system isn't shared.  
2>  Restart all the hadoop services on Namenode/Tasktracker and all the services on the new Datanode. 
3>  Verify the new datanode from the browser http://namenode:50070
4>  Run the balancer script to readjust the data between the nodes. 

If you don't want to restart the services on the NN, when you add a new node. I would say add the names ahead to the slaves configuration file. So they report as decommission/dead nodes until they are available. Following the above DataNode only steps. Again this not the best practice.

Solution 3

Updated Answer for Cloudera using CDH 5.8.5 (Hadoop 2.6)-

To add a new node to your cluster, follow these steps on ClouderaManager UI,

  1. Click on your cluster name.
  2. Go to Hosts List.
  3. Once on the hosts page, click 'Add New Hosts to Cluster'.
  4. Enter the IP of your host and Search.
  5. Keep following the instructions and continue to next steps.
  6. Finally assign roles to your new node, for example if it's a data-node,assign only datanode related roles and continue.
  7. Finally your new node is added to your cluster. click Finish.
Share:
27,138
user1735075
Author by

user1735075

Updated on March 22, 2020

Comments

  • user1735075
    user1735075 about 4 years

    I have been playing with Cloudera and I define the number of clusters before I start my job then use the cloudera manager to make sure everything is running.

    I’m working on a new project that instead of using hadoop is using message queues to distribute the work but the results of the work are stored in HBase. I might launch 10 servers to process the job and store to Hbase but I’m wondering if I later decided to add a few more worker nodes can I easily (read: programmable) make them automatically connect to the running cluster so they can locally add to clusters HBase/HDFS?

    Is this possible and what would I need to learn in order to do it?

  • Tariq
    Tariq over 9 years
    Thanks for your answer, Could you please update your answer fro Hadoop 2.5.2, as there is no conf folder in 2.5.2
  • Tariq
    Tariq over 9 years
    Do I need to update slave files on all the nodes or only on the NameNode?
  • Tariq
    Tariq over 9 years
    Do I need to update /etc/hosts files on all the nodes as well or is it also for NameNodes only?
  • Hans Deragon
    Hans Deragon over 4 years
    Links are broken now.