Is there the equivalent for a `find` command in `hadoop`?
Solution 1
hadoop fs -find
was introduced in Apache Hadoop 2.7.0. Most likely you're using an older version hence you don't have it yet.
see: HADOOP-8989 for more information.
In the meantime you can use
hdfs dfs -ls -R <pattern>
e.g,: hdfs dfs -ls -R /demo/order*.*
but that's not as powerful as 'find' of course and lacks some basics. From what I understand people have been writing scripts around it to get over this problem.
Solution 2
If you are using the Cloudera stack, try the find tool:
org.apache.solr.hadoop.HdfsFindTool
Set the command to a bash variable:
COMMAND='hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool'
Usage as follows:
${COMMAND} -find . -name "something" -type d ...
Solution 3
It you don't have the cloudera parcels available you can use awk.
hdfs dfs -ls -R /some_path | awk -F / '/^d/ && (NF <= 5) && /something/'
that's almost equivalent to the find . -type d -name "*something*" -maxdepth 4
command.
Comments
-
makansij almost 2 years
I know that from the terminal, one can do a
find
command to find files such as :find . -type d -name "*something*" -maxdepth 4
But, when I am in the hadoop file system, I have not found a way to do this.
hadoop fs -find ....
throws an error.
How do people traverse files in hadoop? I'm using
hadoop 2.6.0-cdh5.4.1
. -
user9074332 over 4 yearsThanks. Any idea how to use the
hadoop fs -find
"expression" option? The docs say:The following operators are recognised: expression -a expression expression -and expression expression expression
but i have no idea what this means.`