Compress(minimize) HTML from python

16,751

Solution 1

htmlmin and html_slimmer are some simple html minifying tools for python. I have millions of html pages stored in my database and running htmlmin, I am able to reduce the page size between 5 and 50%. Neither of them do an optimal job at complete html minification (i.e. the font color #00000 can be reduced to #000), but it's a good start. I have a try/except block that runs htmlmin and then if that fails, html_slimmer because htmlmin seems to provide better compression, but it does not support non ascii characters.

Example Code:

import htmlmin
from slimmer import html_slimmer # or xhtml_slimmer, css_slimmer
try:
    html=htmlmin.minify(html, remove_comments=True, remove_empty_space=True)
except:
    html=html_slimmer( html.strip().replace('\n',' ').replace('\t',' ').replace('\r',' ')  )

Good Luck!

Solution 2

I suppose that in GAE there is no really need for minify your html as GAE already gzip it Caching & GZip on GAE (Community Wiki)

I did not test but minified version of html will probably win only 1% of size as it only remove space once both version are compressed.

If you want to save storage, for example by memcached it, you have more interest to gzip it (even at low level of compression) than removing space as in python it will be probably smaller and faster as processed in C instead of pure python

Solution 3

import htmlmin

code='''<body>
    Hello World
    <div style='color:red;'>Hi</div>
    </body>
'''

htmlmin.minify(code)

Last line output

<body> Hello World <div style=color:red;>Hi</div> </body> 

You can use this code to delete spaces

htmlmin.minify(code,remove_empty_space=True)
Share:
16,751

Related videos on Youtube

Johnny Everson
Author by

Johnny Everson

Developer interested in functional programming, Scala, front-end development.

Updated on May 28, 2022

Comments

  • Johnny Everson
    Johnny Everson almost 2 years

    How is to compress (minimize) HTML from python; I know I can use some regex to strip spaces and other things, but I want a real compiler using pure python(so it can be used on Google App Engine).

    I did a test on a online html compressor and it saved 65% of the html size. I want that, but from python.

  • Johnny Everson
    Johnny Everson about 13 years
    thanks for pointing me out this. I see in the logs that some browsers does not yet support gzip; but looking at the logs again, the requests I get like this are not that much.
  • Wooble
    Wooble about 13 years
    Removing 65% of the original HTML probably wont save 65% when zipped, but it will still save something.
  • Shay Erlichmen
    Shay Erlichmen almost 13 years
    Also don't forget that sometimes the html is stored on memcache and you defiantly want to compress it before