Minify HTML/PHP

17,792

Solution 1

Looking at the examples, minifying the HTML output does almost nothing. Think about it: Minifying is great for Javascript which uses many repeating variable and function names, as one of the main things minifying does is shorten them and all references to them.

HTML markup, on the other hand, has no concept of variables or functions. Most of the weight of the page is from actual markup and content. This cannot be minified. Even form variables need to be left alone as they must have their original values to be processed correctly by the server.

Gzipping will already compress the whitespace very efficiently. In HTML, this is really all that can be done.

Also, minifying PHP doesn't apply, because even though it has variables and functions, it is never sent to the client. Shortening the names has no performance benefit for something compiled on the server.

If you are determined to minify your html, check out the source code for the WordPress plugin that does it. That's the beauty of open source. However, I suspect the gains will be negligible as compared to Gzipping.

Solution 2

Peter Anselmo has confused minification for obfuscation. In code obfuscation the code is minified and variables are renamed to shortest length arbitrary names. Minification is merely the practice of reducing code size, such as white space removal, without altering the values, names, or syntax of code.

Peter Anselmo is also wrong that minifying markup results in an insignificant savings. This page, for instance, shows a savings of 18.31% and it was pretty tidy to begin with. Clearly, he has never tested his opinion before he put it out there. You can see the cost savings yourself using the Pretty Diff tool at http://prettydiff.com/

You can attempt to reverse engineer minification engine used by Pretty Diff to execute from PHP. That code and accompanied documentation can be found at: prettydiff.com/markupmin.js

Solution 3

I have created 3 simple functions that will probably need to be optimized, but they do their job, they are part of a bigger class that I use to format code, dates, values, etc...:

to use it simply call:

echo format::minify_html($html_output);

here is the code (still in beta, but so far haven't had many issues with it)

<?php
class format(){
    static function minify_css($text){
        $from   = array(
        //                  '%(#|;|(//)).*%',               // comments:  # or //
            '%/\*(?:(?!\*/).)*\*/%s',       // comments:  /*...*/
            '/\s{2,}/',                     // extra spaces
            "/\s*([;{}])[\r\n\t\s]/",       // new lines
            '/\\s*;\\s*/',                  // white space (ws) between ;
            '/\\s*{\\s*/',                  // remove ws around {
            '/;?\\s*}\\s*/',                // remove ws around } and last semicolon in declaration block
            //                  '/:first-l(etter|ine)\\{/',     // prevent triggering IE6 bug: http://www.crankygeek.com/ie6pebug/
        //                  '/((?:padding|margin|border|outline):\\d+(?:px|em)?) # 1 = prop : 1st numeric value\\s+/x',     // Use newline after 1st numeric value (to limit line lengths).
        //                  '/([^=])#([a-f\\d])\\2([a-f\\d])\\3([a-f\\d])\\4([\\s;\\}])/i',
        );
        $to     = array(
        //                  '',
            '',
            ' ',
            '$1',
            ';',
            '{',
            '}',
            //                  ':first-l$1 {',
        //                  "$1\n",
        //                  '$1#$2$3$4$5',
        );
        $text   = preg_replace($from,$to,$text);
        return $text;
    }
    static function minify_js($text){
        $file_cache     = strtolower(md5($text));
        $folder         = TMPPATH.'tmp_files'.DIRECTORY_SEPARATOR.substr($file_cache,0,2).DIRECTORY_SEPARATOR;
        if(!is_dir($folder))            @mkdir($folder, 0766, true);
        if(!is_dir($folder)){
            echo 'Impossible to create the cache folder:'.$folder;
            return 1;
        }
        $file_cache     = $folder.$file_cache.'_content.js';
        if(!file_exists($file_cache)){
            if(strlen($text)<=100){
                $contents = $text;
            } else {
                $contents = '';
                $post_text = http_build_query(array(
                                'js_code' => $text,
                                'output_info' => 'compiled_code',//($returnErrors ? 'errors' : 'compiled_code'),
                                'output_format' => 'text',
                                'compilation_level' => 'SIMPLE_OPTIMIZATIONS',//'ADVANCED_OPTIMIZATIONS',//'SIMPLE_OPTIMIZATIONS'
                            ), null, '&');
                $URL            = 'http://closure-compiler.appspot.com/compile';
                $allowUrlFopen  = preg_match('/1|yes|on|true/i', ini_get('allow_url_fopen'));
                if($allowUrlFopen){
                    $contents = file_get_contents($URL, false, stream_context_create(array(
                            'http'          => array(
                                'method'        => 'POST',
                                'header'        => 'Content-type: application/x-www-form-urlencoded',
                                'content'       => $post_text,
                                'max_redirects' => 0,
                                'timeout'       => 15,
                            )
                    )));
                }elseif(defined('CURLOPT_POST')) {
                    $ch = curl_init($URL);
                    curl_setopt($ch, CURLOPT_POST, true);
                    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
                    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
                    curl_setopt($ch, CURLOPT_POSTFIELDS, $post_text);
                    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
                    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
                    $contents = curl_exec($ch);
                    curl_close($ch);
                } else {
                    //"Could not make HTTP request: allow_url_open is false and cURL not available"
                    $contents = $text;
                }
                if($contents==false || (trim($contents)=='' && $text!='') || strtolower(substr(trim($contents),0,5))=='error' || strlen($contents)<=50){
                    //No HTTP response from server or empty response or error
                    $contents = $text;
                }
            }
            if(trim($contents)!=''){
                $contents = trim($contents);
                $f = fopen($file_cache, 'w');
                fwrite($f, $contents);
                fclose($f);
            }
        } else {
            touch($file_cache);     //in the future I will add a timetout to the cache
            $contents = file_get_contents($file_cache);
        }
        return $contents;
    }
    static function minify_html($text){
        if(isset($_GET['no_mini'])){
            return $text;
        }
        $file_cache     = strtolower(md5($text));
        $folder         = TMPPATH.'tmp_files'.DIRECTORY_SEPARATOR.substr($file_cache,0,2).DIRECTORY_SEPARATOR;
        if(!is_dir($folder))            @mkdir($folder, 0766, true);
        if(!is_dir($folder)){
            echo 'Impossible to create the cache folder:'.$folder;
            return 1;
        }
        $file_cache     = $folder.$file_cache.'_content.html';
        if(!file_exists($file_cache)){
            //get CSS and save it
            $search_css = '/<\s*style\b[^>]*>(.*?)<\s*\/style>/is';
            $ret = preg_match_all($search_css, $text, $tmps);
            $t_css = array();
            if($ret!==false && $ret>0){
                foreach($tmps as $k=>$v){
                    if($k>0){
                        foreach($v as $kk=>$vv){
                            $t_css[] = $vv;
                        }
                    }
                }
            }
            $css = format::minify_css(implode('\n', $t_css));

/*
            //get external JS and save it
            $search_js_ext = '/<\s*script\b.*?src=\s*[\'|"]([^\'|"]*)[^>]*>\s*<\s*\/script>/i';
            $ret = preg_match_all($search_js_ext, $text, $tmps);
            $t_js = array();
            if($ret!==false && $ret>0){
                foreach($tmps as $k=>$v){
                    if($k>0){
                        foreach($v as $kk=>$vv){
                            $t_js[] = $vv;
                        }
                    }
                }
            }
            $js_ext = $t_js;
*/
            //get inline JS and save it
            $search_js_ext  = '/<\s*script\b.*?src=\s*[\'|"]([^\'|"]*)[^>]*>\s*<\s*\/script>/i';
            $search_js      = '/<\s*script\b[^>]*>(.*?)<\s*\/script>/is';
            $ret            = preg_match_all($search_js, $text, $tmps);
            $t_js           = array();
            $js_ext         = array();
            if($ret!==false && $ret>0){
                foreach($tmps as $k=>$v){
                    if($k==0){
                        //let's check if we have a souce (src="")
                        foreach($v as $kk=>$vv){
                            if($vv!=''){
                                $ret = preg_match_all($search_js_ext, $vv, $ttmps);
                                if($ret!==false && $ret>0){
                                    foreach($ttmps[1] as $kkk=>$vvv){
                                        $js_ext[] = $vvv;
                                    }
                                }
                            }
                        }
                    } else {
                        foreach($v as $kk=>$vv){
                            if($vv!=''){
                                $t_js[] = $vv;
                            }
                        }
                    }
                }
            }
            $js = format::minify_js(implode('\n', $t_js));

            //get inline noscript and save it
            $search_no_js = '/<\s*noscript\b[^>]*>(.*?)<\s*\/noscript>/is';
            $ret = preg_match_all($search_no_js, $text, $tmps);
            $t_js = array();
            if($ret!==false && $ret>0){
                foreach($tmps as $k=>$v){
                    if($k>0){
                        foreach($v as $kk=>$vv){
                            $t_js[] = $vv;
                        }
                    }
                }
            }
            $no_js = implode('\n', $t_js);

            //remove CSS and JS
            $search = array(
                $search_js_ext,
                $search_css,
                $search_js,
                $search_no_js,
                '/\>[^\S ]+/s', //strip whitespaces after tags, except space
                '/[^\S ]+\</s', //strip whitespaces before tags, except space
                '/(\s)+/s',  // shorten multiple whitespace sequences
            );
            $replace = array(
                '',
                '',
                '',
                '',
                '>',
                '<',
                '\\1',
            );
            $buffer = preg_replace($search, $replace, $text);

            $append = '';
            //add CSS and JS at the bottom
            if(is_array($js_ext) && count($js_ext)>0){
                foreach($js_ext as $k=>$v){
                    $append .= '<script type="text/javascript" language="javascript" src="'.$v.'" ></script>';
                }
            }
            if($css!='')        $append .= '<style>'.$css.'</style>';
            if($js!=''){
                //remove weird '\n' strings
                $js = preg_replace('/[\s]*\\\n/', "\n", $js);
                $append .= '<script>'.$js.'</script>';
            }
            if($no_js!='')      $append .= '<noscript>'.$no_js.'</noscript>';
            $buffer = preg_replace('/(.*)(<\s*\/\s*body\s*>)(.*)/','\\1'.$append.'\\2\\3', $buffer);
            if(trim($buffer)!=''){
                $f = fopen($file_cache, 'w');
                fwrite($f, trim($buffer));
                fclose($f);
            }
        } else {
            touch($file_cache);     //in the future I will add a timetout to the cache
            $buffer = file_get_contents($file_cache);
        }

        return $buffer;
    }

}
?>
Share:
17,792
ianyoung
Author by

ianyoung

Updated on June 04, 2022

Comments

  • ianyoung
    ianyoung almost 2 years

    I'm using gzip to compress my html/php files along with js/css/etc. This reduces the payload quite nicely but I also want to 'minify' my markup of both .html and .php pages. Ideally I'd like to control this from a .htaccess file (where I also do the gzipping) rather than the having to include php to each file.

    I'd like the output to be like that of http://google.com or http://www.w3-edge.com/wordpress-plugins/w3-total-cache/ and http://css-tricks.com (both produced by W3 Total Cache plugin for WordPress).

    Can anyone recommend a good way to do this.