Is there any way to download a HDFS file using WebHDFS REST API?
You could probably use the DataNode API for this (default on port 50075), it supports a streamFile
command which you could take advantage of. Using wget
this would look something like:
wget http://$datanode:50075/streamFile/demofile.txt -O ~/demofile.txt
Note that this command needs to be executed on the datanode itself, not on the namenode !
Alternatively, if you don't know which datanode to hit you could ask the jobtracker and it will redirect you to the right datanode with this URL:
http://$namenode:50070/data/demofile.txt
Related videos on Youtube
Comments
-
Tariq almost 2 years
Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content.
curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o ~/demofile.txt
Is there any API that will allow me to download the file directly without having to open it?I went through the official document and tried Google as well, but could not find anything. Could somebody point me in the right direction or provide me some pointers?
Thank you so much for your valuable time.
-
Tariq almost 11 yearsThank you for the reply sir. I just want to download the file as it is and keep it into a directory on my local FS as of now. Reading the file is not my intention at this moment. Also, if I follow the above approach I would end up with a file which includes the header as well "HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 218 Server: Jetty(6.1.26)"
-
Tariq over 6 yearsNot really sure how exactly this question is off-topic. Discussing APIs is what SO is meant for.
-
Petro@Tariq I'm flagging this to be an open question. As a Hadoop administrator, these topics are not always cut and dry approaches, and most of the default documentation leaves out key elements or details. This post should be open for future answers and discussion around the webhdfs API (10k views says it all)
-
-
Tariq almost 11 yearsthank you for the reply sir. i had tried this once but it was giving me "ERROR 500: File does not exist: /.".
-
Charles Menguy almost 11 yearsCan you show me what command you ran?
-
Tariq almost 11 yearswget localhost:50075/streamFile?filename=/demofile.txt -O ~/demofile.txt
-
Charles Menguy almost 11 yearsWhat happens if you do filename=demofile.txt instead of filename=/demofile.txt ?
-
Tariq almost 11 yearsi'm getting the same error
-
Charles Menguy almost 11 yearsWeird, i'll try this this Monday and let you know what I find, if the file exists this should download the file for you.
-
Tariq almost 11 yearsexactly..i was expecting the same..i'll also try and let you know if something clicks..thanks again.
-
Tariq almost 11 yearsand the file does exist with proper permissions. i have checked that twice.
-
Charles Menguy almost 11 years@Tariq Edited my answer with more details, and it looks like you actually don't use "filename=", but put the path file directly after streamFile.
-
Tariq almost 11 yearsthank you so very much sir. we actually don't need "-O ~/demofile.txt". simply running "wget http://$datanode:50075/streamFile/demofile.txt" would do the trick. thanks again.
-
user1570210 over 9 yearsIs there anyway we can download multiple files without knowing the file names only knowing folder name ?
-
2Big2BeSmall over 8 yearsDo i need to give a user password reading files with webhdfs in java ?
-
anegru over 4 yearsAs of Hadoop 3.0.0 port 50075 has been moved to 9870.