Mapreduce Hadoop job exception Output directory already exists

18,233

Correct me if my understanding is wrong.. In the above code, you are referring to "/Users/msadri/Documents/.....", in local file system isn't it.? it seems like fs.defaultFS in core-site.xml is pointing to file:/// instead of hdfs address for your cluster.

1) If you needed to point to Local file system as per your requirement, then try this.

FileSystem.getLocal(conf).delete(outputDir, true);

2) If it is expected to point hdfs then Please check core-site.xml and in that, fs.defaultFS has to point to hdfs://<nameNode>:<port>/ then try it once.. (Error message saying that you are pointing to local file system. if it is pointing to hdfs, it would say "Output directory hdfs://<nameNode>:<port>/Users/msadri/... already exists"

Rule this out if its not necessary. Please let me know your response..

Share:
18,233
Admin
Author by

Admin

Updated on June 05, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm running a mapreduce job with the following run code and it keeps giving me the following exception. I made sure that I remove the folder before starting the job but it doesn't work.

    The code:

        JobConf jobConf = new JobConf( getConf(), MPTU.class );
        jobConf.setJobName( "MPTU" );
    
        AvroJob.setMapperClass( jobConf, MPTUMapper.class );
        AvroJob.setReducerClass( jobConf, MPTUReducer.class );
    
        long milliSeconds = 1000 * 60 * 60;
        jobConf.setLong( "mapred.task.timeout", milliSeconds );
    
        Job job = new Job( jobConf );
        job.setJarByClass( MPTU.class );
    
        String paths = args[0] + "," + args[1];
        FileInputFormat.setInputPaths( job, paths );
        Path outputDir = new Path( args[2] );
        outputDir.getFileSystem( jobConf ).delete( outputDir, true );
        FileOutputFormat.setOutputPath( job, outputDir );
    
        AvroJob.setInputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.LONG ), Schema.create( Type.STRING ) ) );
        AvroJob.setMapOutputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.STRING ),
                                                                 Schema.create( Type.STRING ) ) );
        AvroJob.setOutputSchema( jobConf,
                                 Pair.getPairSchema( Schema.create( Type.STRING ), Schema.create( Type.STRING ) ) );
    
        job.setNumReduceTasks( 400 );
        job.submit();
        JobClient.runJob( jobConf );
    

    The Exception:

    13:31:39,268 ERROR UserGroupInformation:1335 - PriviledgedActionException as:msadri (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
    Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
        at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:937)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
        at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.run(MatchProfileTwitterUserHandler.java:58)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.main(MatchProfileTwitterUserHandler.java:81)