Is there any way to download a HDFS file using WebHDFS REST API?

12,182

You could probably use the DataNode API for this (default on port 50075), it supports a streamFile command which you could take advantage of. Using wget this would look something like:

wget http://$datanode:50075/streamFile/demofile.txt -O ~/demofile.txt

Note that this command needs to be executed on the datanode itself, not on the namenode !

Alternatively, if you don't know which datanode to hit you could ask the jobtracker and it will redirect you to the right datanode with this URL:

http://$namenode:50070/data/demofile.txt
Share:
12,182

Related videos on Youtube

Tariq
Author by

Tariq

Constant learner

Updated on June 04, 2022

Comments

  • Tariq
    Tariq almost 2 years

    Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content.

    curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o ~/demofile.txt
    

    Is there any API that will allow me to download the file directly without having to open it?I went through the official document and tried Google as well, but could not find anything. Could somebody point me in the right direction or provide me some pointers?

    Thank you so much for your valuable time.

    • Tariq
      Tariq almost 11 years
      Thank you for the reply sir. I just want to download the file as it is and keep it into a directory on my local FS as of now. Reading the file is not my intention at this moment. Also, if I follow the above approach I would end up with a file which includes the header as well "HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 218 Server: Jetty(6.1.26)"
    • Tariq
      Tariq over 6 years
      Not really sure how exactly this question is off-topic. Discussing APIs is what SO is meant for.
    • Petro
      Petro
      @Tariq I'm flagging this to be an open question. As a Hadoop administrator, these topics are not always cut and dry approaches, and most of the default documentation leaves out key elements or details. This post should be open for future answers and discussion around the webhdfs API (10k views says it all)
  • Tariq
    Tariq almost 11 years
    thank you for the reply sir. i had tried this once but it was giving me "ERROR 500: File does not exist: /.".
  • Charles Menguy
    Charles Menguy almost 11 years
    Can you show me what command you ran?
  • Tariq
    Tariq almost 11 years
  • Charles Menguy
    Charles Menguy almost 11 years
    What happens if you do filename=demofile.txt instead of filename=/demofile.txt ?
  • Tariq
    Tariq almost 11 years
    i'm getting the same error
  • Charles Menguy
    Charles Menguy almost 11 years
    Weird, i'll try this this Monday and let you know what I find, if the file exists this should download the file for you.
  • Tariq
    Tariq almost 11 years
    exactly..i was expecting the same..i'll also try and let you know if something clicks..thanks again.
  • Tariq
    Tariq almost 11 years
    and the file does exist with proper permissions. i have checked that twice.
  • Charles Menguy
    Charles Menguy almost 11 years
    @Tariq Edited my answer with more details, and it looks like you actually don't use "filename=", but put the path file directly after streamFile.
  • Tariq
    Tariq almost 11 years
    thank you so very much sir. we actually don't need "-O ~/demofile.txt". simply running "wget http://$datanode:50075/streamFile/demofile.txt" would do the trick. thanks again.
  • user1570210
    user1570210 over 9 years
    Is there anyway we can download multiple files without knowing the file names only knowing folder name ?
  • 2Big2BeSmall
    2Big2BeSmall over 8 years
    Do i need to give a user password reading files with webhdfs in java ?
  • anegru
    anegru over 4 years
    As of Hadoop 3.0.0 port 50075 has been moved to 9870.