How to save HTML pages as one file?

38,212

Solution 1

The SingleFile chrome extension is a good solution.

I have also written my own python tool to solve this problem which I would recommend giving a try: https://github.com/zTrix/webpage2html

Solution 2

Viewing and creating MHTML files in current versions of Google Chrome is supported by toggling the "Save Page as MHTML" option on the chrome://flags page.

type chrome://flags in your url box

However, enabling this experimental option disables saving pages as HTML-only or HTML Complete files. From the chrome://flags page:

Solution 3

Extending upon zTrix's answer, I would suggest avoiding the Chrome extension (which did not work for me at all) and instead going with one of these options:

  • Node.js: remy's inliner
    • Easy to install using npm
    • Many options, including flags for disabling minification/compression, maintaining external images, skipping videos, and more.
    • Caveat: (22 September 2017) fails to maintain styling and JavaScript functionality when compiling Slate builds. This won't affect most people directly, but it means that inliner will probably have issues with other pages. See this issue
    • Caveat: no options to "leave things alone": will either minify/uglify CSS/JS or beautify, but will not simply embed original source into HTML.
  • Python 2: zTrix's webpage2html
    • More conservative than inliner; works well for most cases.
    • zTrix fixed a bug (that inliner also seems to have) which ensures JavaScript/CSS functionality when compiling Slate builds. See this issue. (updated 29 September 2017)
    • Can be converted to Python 3 relatively painlessly
    • Caveat: cannot handle CSS @import

Solution 4

Usually, it's possible to create one HTML file that contains all his common children files (css, jpg, js, svg, ...)
You must rewrite the HTML file by replacing "src" attributes' value, "url()" functions and insert HTML tag like "<script></script>" for JavaScript files, "<style></style>" for CSS files and "<svg></svg>" for SVG image.

For example a GIF image file in CSS called by the "url()" function.

  1. download the image from his URL.
  2. encode this image into Base64.
  3. replace "url('https://en.wikipedia.org/wiki/File:TPB_Magnet_Icon.gif')" by "url('data:image/gif;base64,R0lGODlhDAAMALMPAOXl5ewvErW1tebm5oocDkVFRePj47a2ts0WAOTk5MwVAIkcDesuEs0VAEZGRv///yH5BAEAAA8ALAAAAAAMAAwAAARB8MnnqpuzroZYzQvSNMroUeFIjornbK1mVkRzUgQSyPfbFi/dBRdzCAyJoTFhcBQOiYHyAABUDsiCxAFNWj6UbwQAOw')" with the Base64 encoded GIF image, prefixed by "data:image/gif;base64,"

You can do the same thing for the "src" attribute's value. This solution may be used for other binary files. You must adapt the right "data" prefix to corresponding to the encoded object.

Share:
38,212
Dimitri Vorontzov
Author by

Dimitri Vorontzov

Updated on January 04, 2021

Comments

  • Dimitri Vorontzov
    Dimitri Vorontzov over 3 years

    I want to be able to save / archive HTML pages as one file (without those pesky external folders).

    I want the resulting file to contain all styles, images, and links (videos and Flash would be nice, too, but not as crucial).

    I want the resulting file to be searchable, and editable.

    Microsoft's MHT is one of such tools, but unfortunately, it's not searchable under Linux. MHT is good, but I don't want to be locked under one operating system or one company. What would be a good alternative – or perhaps there's some entirely different solution I wasn't thinking about?