Using wget to mirror a website and everything from the first level of external sites
Solution 1
This is unfortunately impossible with wget (and the attempt at solving this with -H -l 1
does not do what you expect). What you want is HTTrack.
httrack --ext-depth=1 http://example.com
This can also be abbreviated as httrack %e1 http://example.com
. Note that HTTrack counts levels starting at 1, not 0, so it won't follow links found on external pages unless you increase the depth.
Solution 2
I would use a combination wget -m -k -K -p http://example.com && wget -r -k -K -H -N -l 1 http://example.com
.
About the two commands: wget -m -k -K -p http://example.com
will mirror (-m = -r --level=inf -N) it, convert the links to your local mirror (-k), backs up the original file before it gets converted (-K) and downloads all prerequisites for proper viewing the mirror (-p).
After that the second command wget -r -k -K -H -N -l 1 http://example.com
would do essentially the same but only for one level spanning all hosts and it would check the timestamps with -N, so you wouldn't download the same files again. I didn't include the -p option here, because it could download very much then...
Related videos on Youtube
Admin
Updated on September 17, 2022Comments
-
Admin over 1 year
I need to mirror a particular website (all the pages under that particular domain) any pages (but not whole sites) that the website links to.
I'm confused about the how to do this
wget -r --level=inf
(or some other variant) will mirror the site.wget -r -H --level=1
will get all the links (from all domains) to the first level.Anyone have any ideas on how I could combine these, to get the entire of the main site and one level deep into external sites. I've been banging my head against the manual all afternoon.
Thanks