Remove whitespace from HTML

50,324

Solution 1

I can't delete this answer but it's no longer relevant, the web landscape has changed so much in 8 years that this has become useless.

Solution 2

$html = preg_replace('~>\s+<~', '><', $html);

But I don't see the point of this. If you're trying to make the data size smaller, there are better options.

Solution 3

It's been a while since this question was first asked but I still see the need to post this answer in order to help people with the same problem.

None of these solutions were adoptabe for me therefore I've came up with this solution: Using output_buffer.

The function ob_start accepts a callback as an argument which is applied to the whole string before outputting it. Therefore if you remove whitespace from the string before flushing the output, there you're done.

/** 
 * Remove multiple spaces from the buffer.
 * 
 * @var string $buffer
 * @return string
 */
function removeWhitespace($buffer)
{
    return preg_replace('/\s+/', ' ', $buffer);
}

ob_start('removeWhitespace');

<!DOCTYPE html>
<html>
    <head></head>
    <body></body>
</html>

ob_get_flush();

The above would print something like:

<!DOCTYPE html> <html> <head> </head> <body> </body> </html>

Hope that helps.

HOW TO USE IT IN OOP

If you're using object-orientated code in PHP you may want to use a call-back function that is inside an object.

If you have a class called, for instance HTML, you have to use this code line

ob_start(["HTML","removeWhitespace"]); 

Solution 4

Just in case someone still needs this, I coined a function from @Martin Angelova's response and @Savas Vedova, the outcome that also solved my problem looks:

<?php 
   function rmspace($buffer){ 
        return preg_replace('~>\s*\n\s*<~', '><', $buffer); 
   };
?>
<?php ob_start("rmspace");  ?>
   //Content goes in here 
<?php ob_end_flush(); ?>

Note: I did not test the performance penalty in a production environment

Solution 5

$html = preg_replace('~>\s*\n\s*<~', '><', $html);

I'm thinking that this is the solution to the <b>Hello</b> <i>world</i> problem. The idea is to remove whitespace only when there's a new line. It will work for common HTML syntax which is:

<div class="wrap">
    <div>
    </div>
</div>
Share:
50,324
James
Author by

James

Updated on November 19, 2020

Comments

  • James
    James over 3 years

    I have HTML code like:

    <div class="wrap">
        <div>
            <div id="hmenus">
                <div class="nav mainnavs">
                    <ul>
                        <li><a id="nav-questions" href="/questions">Questions</a></li>
                        <li><a id="nav-tags" href="/tags">Tags</a></li>
                        <li><a id="nav-users" href="/users">Users</a></li>
                        <li><a id="nav-badges" href="/badges">Badges</a></li>
                        <li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li>
                    </ul>
                </div>
            </div>
        </div>
    </div>
    

    How do I remove whitespace between tags by PHP?

    We should get:

    <div class="wrap"><div><div id="hmenus"><div class="nav mainnavs"><ul><li><a id="nav-questions" href="/questions">Questions</a></li><li><a id="nav-tags" href="/tags">Tags</a></li><li><a id="nav-users" href="/users">Users</a></li><li><a id="nav-badges" href="/badges">Badges</a></li><li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li></ul></div></div></div></div>
    
  • Gumbo
    Gumbo over 13 years
    None of the three pattern modifiers you’ve used are necessary.
  • laander
    laander over 13 years
    True, my bad, please see the other answers for a solution
  • Max Kielland
    Max Kielland about 13 years
    Well, where no one else sees a point, someone else is seeing a lot of them, outside the box... :D This regex works perfect for me.
  • Phil Ricketts
    Phil Ricketts about 13 years
    Google (more than qualified when it comes to performance) suggest via their Page speed tool, that it IS worth doing. When you use GZIP it will compress the extra unnecessary spaces. Obviously, if you remove this spaces before it is GZIP'd then of course the output will be smaller and more efficient. The answer is both!
  • Incognito
    Incognito about 13 years
    This is true. The real question comes down to scale and effort required. Remember, your time is finite, and so is your product. If you're serving 1000 hits a month on 200kb of html content, don't worry. If you're serving 1M hits a month on 5mb of HTML content, optimize like never before. If you have time as a luxury and want to learn how to do this, go ahead, but stripping whitespace to save 50% instead of 40% isn't going to reward you in many places except ySlow.
  • Incognito
    Incognito about 13 years
    That being said, if you're actually having problems with slow loading, there's a tool I use that's very usable for pinpointing issues and tracking history: gtmetrix.com
  • Phil Ricketts
    Phil Ricketts about 13 years
    I propose that this answer is downvoted, because it is incorrect. stackoverflow.com/questions/807119/gzip-versus-minify
  • Incognito
    Incognito about 13 years
    @replete Your linked question is about Javascript, this question is about removing white space from HTML for the sake of increased speed, which has been explained is negligible if we use gzip. The sample code in this question is 556 bytes, the gziped size is 202 btyes, and size with whitespace stripped is 362, that's right, it's larger if we don't gzip.
  • Phil Ricketts
    Phil Ricketts almost 13 years
    @Incognito I understand that from a practical point of view, you are proposing that removing whitespace before gzipping gives little gain over just gzipping. Obviously, minifying first DOES make a difference to the gzipped output. But, you are making an absolute statement saying that there is no point. It's not correct, you should refine your answer. A notable point: In the context of websites, Google actually favours fast sites, to a degree. Look at Google Page Speed - it does care if your site is minified or not!
  • Phil Ricketts
    Phil Ricketts almost 13 years
    @Incognito but if you gzip the minified 362, it would be even smaller than 202 bytes. That's my point.
  • Incognito
    Incognito almost 13 years
    @replete Right, it comes down to a whopping 183b, a whole 19b smaller. This is what I'm saying, after 1 000 000 page views, your savings in this situation would be 18 megabytes, and you've ended up breaking all your PRE tag content. Again, you shouldn't need to strip formatting of your HTML files, the servers deal with this. Why would you ever want to edit the file its self? All of this optimization should be done by the webserver, it's what it was built for.
  • Jared
    Jared over 12 years
    Perfect and simple. Totally works. Thanks for the solution. And yes, just because the point isn't obvious doesn't mean there isn't one. I needed a way to find-and-replace broken tags from a 3rd-party program. Trimming out the white space in the tags helped me get there and solve the problem.
  • Salman A
    Salman A over 12 years
    Sadly this changes <b>Hello</b> <i>world</i> to <b>Hello</b><i>world</i>. Detecting whether a white space is meaningful or not is almost impossible (a list of inline and block level elements will be handy).
  • Simon East
    Simon East almost 12 years
    @SalmanA is right - you need to be very careful about this regex because there are some instances where you don't want to remove whitespace in between tags. This could be inside <pre> <code> <textarea> <script>. This pattern also won't catch the numerous spaces/tabs inserted in text content, unless the tabs are between two tags.
  • Czechnology
    Czechnology almost 12 years
    @Simon, this regex does exactly what the OP wrote (s)he wants: "remove whitespace between tags". Obviously that might not be the best behaviour for all uses but that's up to the OP.
  • Simon East
    Simon East almost 12 years
    Yeah, it may be perfect for the OP's situation, and that's fine. I just think it's an important disclaimer for those Googling 'remove whitespace from HTML' (like I was).
  • Oliver A.
    Oliver A. over 11 years
    In case someone cares: I use this answer it to test my templates. I do not care if it adds whitespace as long as I get the expected html structure for my dummy data;) The problem Slaman mentioned is bad but better than no tests at all.
  • hakre
    hakre about 11 years
    @Salman A: The DTD normally covers whether whitespace is significant or not. preg_replace knows nothing about it :) . Using a HTML parser can help (if Tidy is not an option): Stripping line breaks pre-XML leaves spaces- what is the proper method?
  • sobi3ch
    sobi3ch over 10 years
    I thought Gziping and removing white-spaces is the best answer assuming your website is enough big and often visiting every day. Now I'm confused. If you look on some big websites source code you can see they all using stripping white spaces technique. For instance: * view-source:facebook.com * view-source:google.com * view-source:soundcloud.com/stream So is there any resource describing this problem in more details?
  • Incognito
    Incognito over 10 years
    If your template engine doesn't do it for you, you're not caching and using ESI, you're probably missing the point of doing gzip and strip space.
  • rob_was_taken
    rob_was_taken almost 10 years
    Useful for returning a block of code - for example form data. I used this on a Wordpress shortcode return and it's perfect as I have control of the text though.
  • NoobishPro
    NoobishPro over 9 years
    Everything I google about this, Results in horrible answers like these. It's like people don't even think about the needs of others. Maybe they are asking the question for a different reason than minifying the website? I, for example, have to save some templating in the database. I simply want to compress my html for the database and not the eventual rendering. Sjeez.
  • Gershom Maes
    Gershom Maes over 9 years
    If you're worried about that space between inline elements, you could always use this: $html = preg_replace('~>\s+<~', '> <', $html); (It's exactly the same thing, but with a space between the replacement angle-brackets)
  • Gershom Maes
    Gershom Maes over 9 years
    Also: $html = preg_replace('~>\s+<~', '> <', $html);
  • msrd0
    msrd0 over 9 years
    While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
  • Jomar Sevillejo
    Jomar Sevillejo over 8 years
    Savas, doesn't this remove the spaces you need aswell? say: <div>I need spaces here.</div> <div>There's a space to remove before this div.</div>
  • Zilk
    Zilk over 8 years
    @Jomar: no, it collapses sequences of multiple white-space characters into a single space. The example output in this answer is incorrect; it should be <!DOCTYPE html> <html> <head></head> <body></body> </html>.
  • Savas Vedova
    Savas Vedova over 8 years
    @JomarSevillejo my bad sorry, I updated the output as stated by Zilk.
  • tfont
    tfont over 8 years
    Tested, and this is the solution! See my below version for a minor update on the return forgetting a full string trim.
  • gskema
    gskema over 7 years
    The questions does not state that's it's for web page compression. For example, I need to trim whitespaces because HTML to PDF generator renders extra whitespaces
  • electroid
    electroid over 7 years
    quite fast regex, I use that.
  • Ben Visness
    Ben Visness about 7 years
    You'd be surprised why this is useful...in my case, Wordpress's wpautop function was putting <br> tags inside my SVG elements.
  • Dylan Kinnett
    Dylan Kinnett about 5 years
    The question was about how to do it, not whether it's recommended.
  • Muhammad Rohail
    Muhammad Rohail over 4 years
    @Czechnology: You said that if you're trying to make the data size smaller, there are better options. What are these options?
  • Czechnology
    Czechnology over 4 years
    @muhammadrohail Data compression. Send/store data e.g. gzipped.