Unable to close file because the last block does not have enough number of replicas

14,645

We had similar issue. Its primarily attributed to dfs.namenode.handler.count was not enough. Increasing that may help in some small clusters but it is because of DOS issue where nameNode couldn't handle no. of connections or RPC call and your Pending deletion blocks will grow humongous. Validate the hdfs audit logs and see any mass deletion happening or other hdfs actions and match with the jobs which might be overwhelming NN . Stoping those tasks will help HDFS to recover.

Share:
14,645

Related videos on Youtube

naveenkumarbv
Author by

naveenkumarbv

Updated on June 04, 2022

Comments

  • naveenkumarbv
    naveenkumarbv almost 2 years

    From the error message it is quite obvious that, there was a problem in saving a replica of a particular block related to a file. The reason might be, there was a problem in accessing a data node to save a particular block(replica of a block). Please refer below for the complete log:

    I found another user "huasanyelao" - https://stackoverflow.com/users/987275/huasanyelao also had a similar exception/problem but the use case was different.

    Now, how do we solve these kind of problems? I understand that there is no fixed solution to handle in all scenarios.
    1. What is the immediate step I need to take to fix errors of this kind?
    2. If there are jobs for which I'm not monitoring the log at that time. What approaches do I need to take to fix such issues.

    P.S: Apart from fixing Network or Access Issues, what other approaches should I follow.

    Error Log:

    *15/04/10 11:21:13 INFO impl.TimelineClientImpl: Timeline service address: http://your-name-node/ws/v1/timeline/
    15/04/10 11:21:14 INFO client.RMProxy: Connecting to ResourceManager at your-name-node/xxx.xx.xxx.xx:0000
    15/04/10 11:21:34 WARN hdfs.DFSClient: DataStreamer Exception
    java.nio.channels.UnresolvedAddressException
            at sun.nio.ch.Net.checkAddress(Net.java:29)
            at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:512)
            at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
            at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
            at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1516)
            at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
            at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
            at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
    15/04/10 11:21:40 INFO hdfs.DFSClient: Could not complete /user/xxxxx/.staging/job_11111111111_1212/job.jar retrying...
    15/04/10 11:21:46 INFO hdfs.DFSClient: Could not complete /user/xxxxx/.staging/job_11111111111_1212/job.jar retrying...
    15/04/10 11:21:59 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/xxxxx/.staging/job_11111111111_1212
    Error occured in MapReduce process:
    java.io.IOException: Unable to close file because the last block does not have enough number of replicas.
            at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
            at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100)
            at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
            at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:54)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
            at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
            at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
            at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1903)
            at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1871)
            at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1836)
            at org.apache.hadoop.mapreduce.JobSubmitter.copyJar(JobSubmitter.java:286)
            at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:254)
            at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
            at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
            at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
            at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
            at com.xxx.xxx.xxxx.driver.GenerateMyFormat.runMyExtract(GenerateMyFormat.java:222)
            at com.xxx.xxx.xxxx.driver.GenerateMyFormat.run(GenerateMyFormat.java:101)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
            at com.xxx.xxx.xxxx.driver.GenerateMyFormat.main(GenerateMyFormat.java:42)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
            at java.lang.reflect.Method.invoke(Method.java:597)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:212)*