Can i calculate the total size of a website available for public?
9,596
Based on similar Q&A - get file size of a file to wget before wget-ing it? - I made bash shell wrapper script that will do exactly what you need. :)
The latest code repository could be found on Github over here:
#!/bin/bash
# Info: https://github.com/mariomaric/website-size#readme
# Prepare wget logfile
log=/tmp/wget-website-size-log
# Do the spider magic
echo "### Crawling ${!#} website... ###"
sleep 2s
echo "### This will take some time to finish, please wait. ###"
wget \
--recursive --level=inf \
--spider --server-response \
--no-directories \
--output-file="$log" "$@"
echo "Finished with crawling!"
sleep 1s
# Check if prepared logfile is used
if [ -f "$log" ]; then
# Calculate and print estimated website size
echo "Estimated size: $(\
grep -e "Content-Length" "$log" | \
awk '{sum+=$2} END {printf("%.0f", sum / 1024 / 1024)}'\
) Mb"
# Delete wget log file
rm "$log"
else
echo "Unable to calculate estimated size."
fi
exit
Also, this answer helped tremendously: Shell command to sum integers, one per line?
Related videos on Youtube
![22lk94k943 only](https://i.stack.imgur.com/KFGWe.png?s=256&g=1)
Author by
22lk94k943 only
Updated on September 18, 2022Comments
-
22lk94k943 only almost 2 years
Say i want to download all public pages or make a off line database of the website www.psychocats.net. Now how can i first calculate the total size of the website before start downloading?
-
earthmeLon about 11 yearsYou could write a script using wget or curl that will download each file available, tally up the sizes, and delete the files once they've determined the filesize. Other than that, or having access to the server itself, you're not going to find an easy solution.
-