Sanitizing HTML input

16,181

Solution 1

You will have to decide between good and lightweight. The recommended choice is 'HTMLPurifier', because it provide no-fuss secure defaults. As faster alternative it is often advised to use 'htmLawed'.

See also this quite objective overview from the HTMLPurifier author: http://htmlpurifier.org/comparison

Solution 2

I really like HTML Purifier, which allows you to specify which tags and attirbutes are allowed in your HTML code -- and generates valid HTML.

Solution 3

Use BB codes (or like here on SO), otherwise chances are very slim. Example function...

function parse($string){

    $pattern = array(
    "/\[url\](.*?)\[\/url\]/",
    "/\[img\](.*?)\[\/img\]/",
    "/\[img\=(.*?)\](.*?)\[\/img\]/",
    "/\[url\=(.*?)\](.*?)\[\/url\]/",
    "/\[red\](.*?)\[\/red\]/",
    "/\[b\](.*?)\[\/b\]/",
    "/\[h(.*?)\](.*?)\[\/h(.*?)\]/",
    "/\[p\](.*?)\[\/p\]/",    
    "/\[php\](.*?)\[\/php\]/is"
    );

    $replacement = array(
    '<a href="\\1">\\1</a>',
    '<img alt="" src="\\1"/>',
    '<img alt="" class="\\1" src="\\2"/>',
    '<a rel="nofollow" target="_blank" href="\\1">\\2</a>',
    '<span style="color:#ff0000;">\\1</span>',
    '<span style="font-weight:bold;">\\1</span>',
    '<h\\1>\\2</h\\3>',
    '<p>\\1</p>',
    '<pre><code class="php">\\1</code></pre>'
    );

    $string = preg_replace($pattern, $replacement, $string);

    $string = nl2br($string);

    return $string;

}

...

echo parse("[h2]Lorem Ipsum[/h2][p]Dolor sit amet[/p]");

Result...

<h2>Lorem Ipsum</h2><p>Dolor sit amet</p>

enter image description here

Or just use HTML Purifier :)

Solution 4

Both HTML Purifier and htmLawed are good. htmLawed has the advantage of a much smaller footprint and high configurability. Besides doing the standard work of balancing tags, filtering specific HTML tags or their attributes or attribute content (through white or black lists), etc., it also allows the use of custom functions.

Share:
16,181

Related videos on Youtube

James P.
Author by

James P.

Updated on June 04, 2022

Comments

  • James P.
    James P. almost 2 years

    I'm thinking of adding a rich text editor to allow a non-programmer to change the aspect of text. However, one issue is that it's possible to distort the layout of a rendered page if the markup is incorrect. What's a good lightweight way to sanitize html?

  • James P.
    James P. about 13 years
    Good suggestion. I'm wondering why an animated dragon appeared when upvoting you though :p .
  • Lauren
    Lauren about 13 years
    In order for BBCode to be secured, you would have to run it through a a purifier such as HTMLPurifier anyway. There's really no point. Naive BBCode is wide open to exploits: consider what the input string [img]http://picture.of.a/pony.png" onload="execute(); arbitrary(); javascript();[/img] would be produced as using the above parser.
  • Dejan Marjanović
    Dejan Marjanović about 13 years
    Yup, definitely not for public usage, I ignored security aspect completely, I thought it was for private usage. @James P., use HTMLPurifier ;)
  • James P.
    James P. about 13 years
    Thanks. I got HTMLPurifier working. The documentation isn't easy to read but I managed to get it to filter some rich text to a minimum and adapted the charset to iso to avoid accents getting removed.
  • ymakux
    ymakux over 7 years
    To someone who consider htmLawed: first look at the code - you'll cry. There's no alternative to HTMLPurifier at this moment. Just to save your time
  • ymakux
    ymakux over 7 years
    + nice things like $GLOBALS['C'] = $C;
  • user594694
    user594694 about 7 years
    What's wrong with the code? Just because you cannot understand it does not make it bad. htmLawed is just too much faster, smaller and more efficient that HTMLPurifier to not consider because it is not written the way you like.
  • DennisK
    DennisK over 5 years
    The HTMLLawed author seems to have no sense of security. The website and forum is not using HTTPS, and the website urges you to disable Composer's secure-http, as he cannot be arsed to move to HTTPS or a Git repository. I wouldn't trust anything security-related to that person.