How to save a web page snapshot with all its elements (css, js, images, ...) into one file

javascript html css http archive

36,176

Solution 1

HTTrack, -%M

Solution 2

Use wget in terminal

wget -p -k http://www.example.com/

It'll make a clone of site frontend html, css, js, svg etc. But not in one file as asked. Rather, it'll recreate the whole folder structure

E.g. if folder structure of www.example.com is as

 /css/*
 /js/*
 /index.html

then it'll create the same structure locally.

Docs: https://www.gnu.org/software/wget/manual/wget.html

Solution 3

I think @reisio (+1) has you covered...

...But if only to plug a great free tool, I would point out the Firefox extension Save Complete, which does an admirable job of grabbing "complete" pages on an ad hoc basis. The output will be a single HTML file with an accompanying directory stuffed with all the resources - you can easily zip them up for archiving.

It's not without fault - I've had issues with corrupted .png files lately on OSX, but I use it frequently for building mockups off of live pages and it's a huge time-saver. (Also of note, it hasn't been updated for FF 4 yet, and is the sole reason I rolled back to 3.6)

36,176

Author by

Vacilando

Updated on May 03, 2021

Comments

Vacilando about 3 years

How is it possible to programmatically save a web page snapshot with all its elements (css, js, images, ...) into one file?

I need to archive some web pages regularly. However, just saving their HTML code is useless - not only because of images missing but esp. because the absence of CSS on today's pages can turn a web page into unrecognizable mess.

I remember the .mht format that worked like this, but that required manual saving, and it was just a feature of IE. I believe there is an open-source solution that can achieve this programmatically, but despite hours of searching I cannot find it on the web.
Christian about 13 years

How is this method automated, or even programmable?
peteorpeter about 13 years

It's much more automated than manually collecting all the resources and migrating the references, etc. See this caveat: "on an ad hoc basis"? I'm not claiming it's the perfect solution, but might be useful to people trying to achieve a similar, semi-automated result. Also, for the sake of argument, you could script FF to automate this further: macscripter.net/viewtopic.php?id=21304. (Do you think all potentially helpful, but imperfect, solutions should be -1'ed? I'm resisting the urge to down-vote your own imperfect, yet potentially helpful answer. Spirit foul.)
Christian about 13 years

Semi perfect? It works, it's not browser dependent, and it's more automated than trying to script Firefox! Are we back to "viewable by Firefox only" era again, or something? My solution can be done with any language on any platform. Your solution seems to work on firefox on a mac only. Plus firing a browser just to do some text manipulation sounds ridiculously over-engineered.
peteorpeter about 13 years

I'm not knocking your answer - for the record it sounds like the cleanest solution to the question asked. My hackles were raised by your attitude, not your knowledge.
Christian about 13 years

You could call it "overly defensive" if you want to.
Vacilando over 12 years

We look for a way to do this programmatically.
nest over 9 years

It doesn't download the javascript
reisio over 6 years

There isn't any JavaScript worth downloading that you wouldn't have loaded directly (& therefore saved directly). That said: You could do an ordinary httrack, without -%M, and then put that into an archive. With things like archivemount you can open them seemlessly, even though you don't need to. All easily scripted. Stack Overflow sucks.