Is there a way to add nodes to a running Hadoop cluster?
Solution 1
Here is the documentation for adding a node to Hadoop and for HBase. Looking at the documentation, there is no need to restart the cluster. A node can be added dynamically.
Solution 2
Following steps should help you launch the new node into the running cluster.
1> Update the /etc/hadoop/conf/slaves list with the new node-name
2> Sync the full configuration /etc/hadoop/conf to the new datanode from the Namenode. If the file system isn't shared.
2> Restart all the hadoop services on Namenode/Tasktracker and all the services on the new Datanode.
3> Verify the new datanode from the browser http://namenode:50070
4> Run the balancer script to readjust the data between the nodes.
If you don't want to restart the services on the NN, when you add a new node. I would say add the names ahead to the slaves configuration file. So they report as decommission/dead nodes until they are available. Following the above DataNode only steps. Again this not the best practice.
Solution 3
Updated Answer for Cloudera using CDH 5.8.5 (Hadoop 2.6)-
To add a new node to your cluster, follow these steps on ClouderaManager UI,
- Click on your cluster name.
- Go to Hosts List.
- Once on the hosts page, click 'Add New Hosts to Cluster'.
- Enter the IP of your host and Search.
- Keep following the instructions and continue to next steps.
- Finally assign roles to your new node, for example if it's a data-node,assign only datanode related roles and continue.
- Finally your new node is added to your cluster. click Finish.
user1735075
Updated on March 22, 2020Comments
-
user1735075 about 4 years
I have been playing with Cloudera and I define the number of clusters before I start my job then use the cloudera manager to make sure everything is running.
I’m working on a new project that instead of using hadoop is using message queues to distribute the work but the results of the work are stored in HBase. I might launch 10 servers to process the job and store to Hbase but I’m wondering if I later decided to add a few more worker nodes can I easily (read: programmable) make them automatically connect to the running cluster so they can locally add to clusters HBase/HDFS?
Is this possible and what would I need to learn in order to do it?
-
Tariq over 9 yearsThanks for your answer, Could you please update your answer fro Hadoop 2.5.2, as there is no conf folder in 2.5.2
-
Tariq over 9 yearsDo I need to update slave files on all the nodes or only on the NameNode?
-
Tariq over 9 yearsDo I need to update /etc/hosts files on all the nodes as well or is it also for NameNodes only?
-
Hans Deragon over 4 yearsLinks are broken now.