Make curl parallel requests faster

5,944

Use --head (or -I) to fetch just the headers, rather than the headers plus the contents of the web page. Depending on the size of the web pages, this may reduce network traffic.

You have already specified that the output is to be discarded, so no time is lost writing it to a file. Therefore the only reduction this will give is on the server side and on the network. The server will not actually send the page over the net, but it may still generate it, or retrieve it from cache.

Share:
5,944

Related videos on Youtube

Slava Knyazev
Author by

Slava Knyazev

Updated on September 18, 2022

Comments

  • Slava Knyazev
    Slava Knyazev over 1 year

    I need to test the response code of 1 Billion (yes with a "B") pages. I am perfectly aware that no matter how I twist it, it will take many days to do. I have time.

    However, my current script seems to only do ~200 per second. This isn't fast enough. At this rate it will take 2 months.

    mycurl() {
        response=$(curl --write-out %{http_code} --silent --output /dev/null http://www.example.com/test/$1)
        echo "$1"
        if [ "$response" == "200" ]; then
            echo "valid" > "enum/$1" | cat
        fi
    }
    export -f mycurl
    
    i=0
    
    seq 1000000000 | parallel -j0 mycurl
    

    I have a feeling parallel isn't going as fast as it could (i.e. waiting for something).

    I have found this but am unsure about how to make it work: https://www.gnu.org/software/parallel/sem.html

    How can I optimise my script?

    • Jos
      Jos almost 7 years
      Would it help to specify --head to fetch just the headers?