could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

hadoop hdfs hadoop-yarn hadoop2 apache-tez

13,808

This error occurs in BlockManager::chooseTarget4NewBlock() (I am referring to the latest code) code. Specific piece of code, which causes this is:

final DatanodeStorageInfo[] targets = blockplacement.chooseTarget(src,
    numOfReplicas, client, excludedNodes, blocksize, 
    favoredDatanodeDescriptors, storagePolicy);

if (targets.length < minReplication) {
  throw new IOException("File " + src + " could only be replicated to "
      + targets.length + " nodes instead of minReplication (="
      + minReplication + ").  There are "
      + getDatanodeManager().getNetworkTopology().getNumOfLeaves()
      + " datanode(s) running and "
      + (excludedNodes == null? "no": excludedNodes.size())
      + " node(s) are excluded in this operation.");
}

This occurs, when the BlockManager tries to choose a target host for storing new block of data and can not find a single host (targets.length < minReplication). minReplication is set to 1 (configuration parameter: dfs.namenode.replication.min) in hdfs-site.xml file.

This could occur due to one of the following reasons:

Data Node instances are not running
Data Node instances are unable to contact the Name Node
Data Nodes have run out of space, hence no new block of data can be allocated to them

But, in your case, error message also contains following information:

There are 4 datanode(s) running and no node(s) are excluded in this operation.

It means, there are 4 Data Nodes running and all the 4 Data Nodes were considered for placement of data, for this operation.

So, possible suspect is disk space on the Data Nodes. You can check the disk space on your Data Nodes, using the following command:

hdfs dfsadmin -report

It gives report for each of your Live Data Nodes. For e.g. in my case, I got the following:

Live datanodes (1):

Name: 192.168.56.1:50010 (192.168.56.1)
Hostname: 192.168.56.1
Decommission Status : Normal
Configured Capacity: 648690003968 (604.14 GB)
DFS Used: 193849055737 (180.54 GB)
Non DFS Used: 186164975111 (173.38 GB)
DFS Remaining: 268675973120 (250.22 GB)
DFS Used%: 29.88%
DFS Remaining%: 41.42%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Dec 13 17:17:34 IST 2015

Check the "DFS-Remaining" and "DFS-Remaining%". That should give you an idea about the remaining space on your Data Nodes.

You can also refer to the wiki here: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo, which describes the reasons for this error and ways to mitigate it.

13,808

Author by

Mona Jalal

contact me at [email protected] I am a 5th-year computer science Ph.D. Candidate at Boston University advised by Professor Vijaya Kolachalama in computer vision as the area of study. Currently, I am working on my proposal exam and thesis on the use of efficient computer vision and deep learning for cancer detection in H&E stained digital pathology images.

Updated on July 26, 2022

Comments

Mona Jalal almost 2 years

I don't know how to fix this error:

Vertex failed, vertexName=initialmap, vertexId=vertex_1449805139484_0001_1_00, diagnostics=[Task failed, taskId=task_1449805139484_0001_1_00_000003, diagnostics=[AttemptID:attempt_1449805139484_0001_1_00_000003_0 Info:Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/gridmix-kon/input/_temporary/1/_temporary/attempt_14498051394840_0001_m_000003_0/part-m-00003/segment-121 could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2010)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)

Any idea what's the case?

earl almost 7 years

I am facing similar issue. DFS Remaining: 127352772317 (118.61 GB) is coming in admin report. All other three datanodes are showing approx same DFS remianing as above. Live Datanodes is also coming as 3 which is my actual number of datanodes.
earl almost 7 years

Error is : java.io.IOException: File /topics/+tmp/testTopic/year=2017/month=07/day=03/hour=03/380‌f83e3-adaa-4c4a-b195‌-ccb9ad0c8ddf_tmp could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation.