How to `wget` a list of URLs in a text file?

148,702

Solution 1

Quick man wget gives me the following:

[..]

-i file

--input-file=file

Read URLs from a local or external file. If - is specified as file, URLs are read from the standard input. (Use ./- to read from a file literally named -.)

If this function is used, no URLs need be present on the command line. If there are URLs both on the command line and in an input file, those on the command lines will be the first ones to be retrieved. If --force-html is not specified, then file should consist of a series of URLs, one per line.

[..]

So: wget -i text_file.txt

Solution 2

try:

wget -i text_file.txt

(check man wget)

Solution 3

If you also want to preserve the original file name, try with:

wget --content-disposition --trust-server-names -i list_of_urls.txt

Solution 4

Run it in parallel with

cat text_file.txt | parallel --gnu "wget {}"

Solution 5

If you're on OpenWrt or using some old version of wget which doesn't gives you -i option:

#!/bin/bash
input="text_file.txt"
while IFS= read -r line
do
  wget $line
done < "$input"

Furthermore, if you don't have wget, you can use curl or whatever you use for downloading individual files.

Share:
148,702

Related videos on Youtube

ShanZhengYang
Author by

ShanZhengYang

Updated on July 08, 2022

Comments

  • ShanZhengYang
    ShanZhengYang 6 months

    Let's say I have a text file of hundreds of URLs in one location, e.g.

    http://url/file_to_download1.gz
    http://url/file_to_download2.gz
    http://url/file_to_download3.gz
    http://url/file_to_download4.gz
    http://url/file_to_download5.gz
    ....
    

    What is the correct way to download each of these files with wget? I suspect there's a command like wget -flag -flag text_file.txt

    • Dave
      Dave about 5 years
      Anybody end up here after trying to get US topos at nationalmap.gov?
    • barlop
      barlop over 2 years
      Besides wget -i, You'll want to add some switches so you don't get banned from the servers for hammering them! And so that if it can't download one it doesn't keep trying for too long -w and -t and -T may be of interest
  • becko
    becko about 5 years
    Is there a way to control the number of concurrent jobs?
  • Ricardo
    Ricardo over 2 years
    Check the answer below by @Yusef: cat text_file.txt | parallel --gnu "wget {}"
  • Ahmed Fasih
    Ahmed Fasih 9 months
    If Parallel's demand for citation is annoying, use xargs: cat text_file.txt | xargs -n10 -P4 wget. This tells xargs to call wget with 10 URLs and run 4 wget processes at a time. For a little bit nicer experience, here's what I do: cat text_file.txt | shuf | xargs -n10 -P4 wget --continue. This (1) shuffles the URLs so when you stop and restart, it's more likely to start downloading new files right away, and (2) it asks wget to continue partial downloads (you might get some if you Control-C while wget is downloading).