Can i calculate the total size of a website available for public?

9,596

Based on similar Q&A - get file size of a file to wget before wget-ing it? - I made bash shell wrapper script that will do exactly what you need. :)

The latest code repository could be found on Github over here:

#!/bin/bash
# Info: https://github.com/mariomaric/website-size#readme

# Prepare wget logfile
log=/tmp/wget-website-size-log

# Do the spider magic
echo "### Crawling ${!#} website... ###"
sleep 2s
echo "### This will take some time to finish, please wait. ###"

wget \
  --recursive --level=inf \
  --spider --server-response \
  --no-directories \
  --output-file="$log" "$@"

echo "Finished with crawling!"
sleep 1s

# Check if prepared logfile is used
if [ -f "$log" ]; then
    # Calculate and print estimated website size
    echo "Estimated size: $(\
        grep -e "Content-Length" "$log" | \
        awk '{sum+=$2} END {printf("%.0f", sum / 1024 / 1024)}'\
    ) Mb"

    # Delete wget log file
    rm "$log"
else
    echo "Unable to calculate estimated size."
fi  

exit

Also, this answer helped tremendously: Shell command to sum integers, one per line?

Share:
9,596

Related videos on Youtube

22lk94k943 only
Author by

22lk94k943 only

Updated on September 18, 2022

Comments

  • 22lk94k943 only
    22lk94k943 only almost 2 years

    Say i want to download all public pages or make a off line database of the website www.psychocats.net. Now how can i first calculate the total size of the website before start downloading?

    • earthmeLon
      earthmeLon about 11 years
      You could write a script using wget or curl that will download each file available, tally up the sizes, and delete the files once they've determined the filesize. Other than that, or having access to the server itself, you're not going to find an easy solution.