How to move files within the Hadoop HDFS directory?
Solution 1
FileSystem.rename will move the file from source to destination directory. I believe you can use it for your requirement.
Solution 2
The best way to do this is with org.apache.hadoop.fs.FileUtil.copy()
, setting the deleteSource
parameter to true
. People commonly use FileSystem.rename()
, but that function will fail silently for invisible issues (such as the source and destination Paths being on different volumes)
Saurabh Gokhale
Updated on June 04, 2022Comments
-
Saurabh Gokhale almost 2 years
I need to move the files from one HDFS directory to another HDFS directory.
I wanted to check if there's some easier way (some HDFS API) to achieve the same task, other than InputStream/OutputStream ?
I've heard of
FileSystem.rename(srcDir, destDir);
but is unsure if this will delete the original src directory.I don't want to remove the original directory structure, only move the files from one folder to another directory.
e.g
input Dir - /testHDFS/input/*.txt dest Dir - /testHDFS/destination
After moving the files, directory should look something like this :-
input Dir - /testHDFS/input dest Dir - /testHDFS/destination/*.txt
PS : I want to achieve this working inside mapper function for each file.
Any help would be appreciated.
-
Saurabh Gokhale over 10 yearsWill renaming the directory remove my original directory ? I don't want to delete that structure.
-
sara over 8 yearsHow to achieve this using python script. I need to rename the file system directory.? Is there any way in python other than -mv
-
Sean over 6 years
rename()
just re-links, and will fail if your source and destination dirs are on different volumes (without any explanation). Instead, use FileUtil.copy withdeleteSource=true