How to get hadoop put to create directories if they don't exist
Solution 1
Now you should use
hadoop fs -mkdir -p <path>
Solution 2
EDITORIAL NOTE: WARNING THIS ANSWER IS INDICATED TO BE INCORRECT
hadoop fs ...
is deprecated instead use : hdfs dfs -mkdir ....
Solution 3
Placing a file into a non-extant directory in hdfs requires a two-step process. As @rt-vybor stated, use the '-p' option to mkdir to create multiple missing path elements. But since the OP asked how to place the file into hdfs, the following also performs the hdfs put, and note that you can also (optionally) check that the put succeeded, and conditionally remove the local copy.
First create the relevant directory path in hdfs, and then put the file into hdfs. You want to check that the file exists prior to placing into hdfs. And you may want to log/show that the file has been successfully placed into hdfs. The following combines all the steps.
fn=myfile.txt
if [ -f $fn ] ; then
bfn=`basename $fn` #trim path from filename
hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn
success=$? #check whether file landed in hdfs
if [ $success ] ; then
echo "remove local copy of file $fn"
#rm -f $fn #uncomment if you want to remove file
fi
fi
And you can turn this into a shell script, taking a hadoop path, and a list of files (also only create path once),
#!/bin/bash
hdfsp=${1}
shift;
hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
for fn in $*; do
if [ -f $fn ] ; then
bfn=`basename $fn` #trim path from filename
hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn >/dev/null
success=$? #check whether file landed in hdfs
if [ $success ] ; then
echo "remove local copy of file $fn"
#rm -f $fn #uncomment if you want to remove file
fi
fi
done
Admin
Updated on July 22, 2022Comments
-
Admin almost 2 years
I have been using Cloudera's hadoop (0.20.2). With this version, if I put a file into the file system, but the directory structure did not exist, it automatically created the parent directories:
So for example, if I had no directories in hdfs and typed:
hadoop fs -put myfile.txt /some/non/existing/path/myfile.txt
It would create all of the directories: some, non, existing and path and put the file in there.
Now, with a newer offering of hadoop (2.2.0) this auto creation of directories is not happening. The same command above yields:
put: ` /some/non/existing/path/': No such file or directory
I have a workaround to just do hadoop fs -mkdir first, for every put, but this is not going to perform well.
Is this configurable? Any advice?