Multithreaded downloading with shell script

8,520

Solution 1

Have a look at man xargs:

-P max-procs --max-procs=max-procs

         Run  up  to max-procs processes at a time; the default is 1.  If
         max-procs is 0, xargs will run as many processes as possible  at
         a  time.

Solution:

xargs -P 20 -n 1 wget -nv <urs.txt

Solution 2

If you just want to grab each URL(regardless of number) then the answer is easy:

#!/bin/bash
URL_LIST="http://url1/ http://url2/"

for url in $URL_LIST ; do
    wget ${url} & >/dev/null
done

If you want to only create a limited number of pulls, say 10. Then you would do something like this:

#!/bin/bash
URL_LIST="http://url1/ http://url2/"

function download() {
    touch /tmp/dl-${1}.lck
    wget ${url} >/dev/null
    rm -f /tmp/dl-${1}.lck
}

for url in $URL_LIST ; do
    while [ 1 ] ; do
        iter=0
        while [ $iter -lt 10 ] ; do
            if [ ! -f /tmp/dl-${iter}.lck ] ; then
                download $iter &
                break 2
            fi
            let iter++
        done
        sleep 10s
    done
done

Do note I haven't actually tested it, but just banged it out in 15 minutes. but you should get a general idea.

Solution 3

You could use something like puf which is designed for that sort of thing, or you could use wget/curl/lynx in combination with GNU parallel.

Share:
8,520

Related videos on Youtube

synapse
Author by

synapse

Updated on September 17, 2022

Comments

  • synapse
    synapse over 1 year

    Let's say I have a file with lots of URLs and I want to download them in parallel using arbitrary number of processes. How can I do it with bash?

  • Richard June
    Richard June about 13 years
    Oh, that's very slick. did not know about -P
  • Gordon Davisson
    Gordon Davisson about 13 years
    In case the original link vanishes, the recommended command (with useless use of cat removed) is: xargs -P 20 -n 1 wget -nv <urs.txt
  • Latheeshwar Raj
    Latheeshwar Raj about 13 years
    Oh, also, unless you have separate ISPs, or bandwidth limitations or something, you USUALLY are not going to have any total faster download speed, by doing it in parallel
  • Ole Tange
    Ole Tange about 13 years
    Which would looke like this: cat urlfile | parallel -j50 wget