Is there the equivalent for a `find` command in `hadoop`?

16,006

Solution 1

hadoop fs -find was introduced in Apache Hadoop 2.7.0. Most likely you're using an older version hence you don't have it yet. see: HADOOP-8989 for more information.

In the meantime you can use

hdfs dfs -ls -R <pattern>

e.g,: hdfs dfs -ls -R /demo/order*.*

but that's not as powerful as 'find' of course and lacks some basics. From what I understand people have been writing scripts around it to get over this problem.

Solution 2

If you are using the Cloudera stack, try the find tool:

org.apache.solr.hadoop.HdfsFindTool

Set the command to a bash variable:

COMMAND='hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool'

Usage as follows:

${COMMAND} -find . -name "something" -type d ...

Solution 3

It you don't have the cloudera parcels available you can use awk.

hdfs dfs -ls -R /some_path | awk -F / '/^d/ && (NF <= 5) && /something/' 

that's almost equivalent to the find . -type d -name "*something*" -maxdepth 4 command.

Share:
16,006
makansij
Author by

makansij

I'm a PhD Student at University of Southern California.

Updated on June 13, 2022

Comments

  • makansij
    makansij almost 2 years

    I know that from the terminal, one can do a find command to find files such as :

    find . -type d -name "*something*" -maxdepth 4 
    

    But, when I am in the hadoop file system, I have not found a way to do this.

    hadoop fs -find ....
    

    throws an error.

    How do people traverse files in hadoop? I'm using hadoop 2.6.0-cdh5.4.1.

  • user9074332
    user9074332 over 4 years
    Thanks. Any idea how to use the hadoop fs -find "expression" option? The docs say: The following operators are recognised: expression -a expression expression -and expression expression expression but i have no idea what this means.`