How to move files within the Hadoop HDFS directory?

11,065

Solution 1

FileSystem.rename will move the file from source to destination directory. I believe you can use it for your requirement.

Solution 2

The best way to do this is with org.apache.hadoop.fs.FileUtil.copy(), setting the deleteSource parameter to true. People commonly use FileSystem.rename(), but that function will fail silently for invisible issues (such as the source and destination Paths being on different volumes)

Share:
11,065
Saurabh Gokhale
Author by

Saurabh Gokhale

Updated on June 04, 2022

Comments

  • Saurabh Gokhale
    Saurabh Gokhale almost 2 years

    I need to move the files from one HDFS directory to another HDFS directory.

    I wanted to check if there's some easier way (some HDFS API) to achieve the same task, other than InputStream/OutputStream ?

    I've heard of FileSystem.rename(srcDir, destDir); but is unsure if this will delete the original src directory.

    I don't want to remove the original directory structure, only move the files from one folder to another directory.

    e.g

    input Dir - /testHDFS/input/*.txt
    dest Dir - /testHDFS/destination
    

    After moving the files, directory should look something like this :-

    input Dir - /testHDFS/input
    dest Dir - /testHDFS/destination/*.txt
    

    PS : I want to achieve this working inside mapper function for each file.

    Any help would be appreciated.

  • Saurabh Gokhale
    Saurabh Gokhale over 10 years
    Will renaming the directory remove my original directory ? I don't want to delete that structure.
  • sara
    sara over 8 years
    How to achieve this using python script. I need to rename the file system directory.? Is there any way in python other than -mv
  • Sean
    Sean over 6 years
    rename() just re-links, and will fail if your source and destination dirs are on different volumes (without any explanation). Instead, use FileUtil.copy with deleteSource=true