How to save a web page snapshot with all its elements (css, js, images, ...) into one file

36,176

Solution 1

HTTrack, -%M

Solution 2

Use wget in terminal

wget -p -k http://www.example.com/

It'll make a clone of site frontend html, css, js, svg etc. But not in one file as asked. Rather, it'll recreate the whole folder structure

E.g. if folder structure of www.example.com is as

 /css/*
 /js/*
 /index.html

then it'll create the same structure locally.

Docs: https://www.gnu.org/software/wget/manual/wget.html

Solution 3

I think @reisio (+1) has you covered...

...But if only to plug a great free tool, I would point out the Firefox extension Save Complete, which does an admirable job of grabbing "complete" pages on an ad hoc basis. The output will be a single HTML file with an accompanying directory stuffed with all the resources - you can easily zip them up for archiving.

It's not without fault - I've had issues with corrupted .png files lately on OSX, but I use it frequently for building mockups off of live pages and it's a huge time-saver. (Also of note, it hasn't been updated for FF 4 yet, and is the sole reason I rolled back to 3.6)

Share:
36,176
Vacilando
Author by

Vacilando

Updated on May 03, 2021

Comments

  • Vacilando
    Vacilando about 3 years

    How is it possible to programmatically save a web page snapshot with all its elements (css, js, images, ...) into one file?

    I need to archive some web pages regularly. However, just saving their HTML code is useless - not only because of images missing but esp. because the absence of CSS on today's pages can turn a web page into unrecognizable mess.

    I remember the .mht format that worked like this, but that required manual saving, and it was just a feature of IE. I believe there is an open-source solution that can achieve this programmatically, but despite hours of searching I cannot find it on the web.

  • Christian
    Christian about 13 years
    How is this method automated, or even programmable?
  • peteorpeter
    peteorpeter about 13 years
    It's much more automated than manually collecting all the resources and migrating the references, etc. See this caveat: "on an ad hoc basis"? I'm not claiming it's the perfect solution, but might be useful to people trying to achieve a similar, semi-automated result. Also, for the sake of argument, you could script FF to automate this further: macscripter.net/viewtopic.php?id=21304. (Do you think all potentially helpful, but imperfect, solutions should be -1'ed? I'm resisting the urge to down-vote your own imperfect, yet potentially helpful answer. Spirit foul.)
  • Christian
    Christian about 13 years
    Semi perfect? It works, it's not browser dependent, and it's more automated than trying to script Firefox! Are we back to "viewable by Firefox only" era again, or something? My solution can be done with any language on any platform. Your solution seems to work on firefox on a mac only. Plus firing a browser just to do some text manipulation sounds ridiculously over-engineered.
  • peteorpeter
    peteorpeter about 13 years
    I'm not knocking your answer - for the record it sounds like the cleanest solution to the question asked. My hackles were raised by your attitude, not your knowledge.
  • Christian
    Christian about 13 years
    You could call it "overly defensive" if you want to.
  • Vacilando
    Vacilando over 12 years
    We look for a way to do this programmatically.
  • nest
    nest over 9 years
    It doesn't download the javascript
  • reisio
    reisio over 6 years
    There isn't any JavaScript worth downloading that you wouldn't have loaded directly (& therefore saved directly). That said: You could do an ordinary httrack, without -%M, and then put that into an archive. With things like archivemount you can open them seemlessly, even though you don't need to. All easily scripted. Stack Overflow sucks.