Remove <p> tags - Regular Expression (Regex)

17,684

Solution 1

I'm sure you know the warnings about using regex to match html. With these disclaimers, you can do this:

Option 1: Leaving the closing </p> tags

This first option leaves the closing </p> tags, but that's what your desired output shows. :) Option 2 will remove them as well.

PHP

$replaced = preg_replace('~<p[^>]*>~', '', $yourstring);

JavaScript

replaced = yourstring.replace(/<p[^>]*>/g, "");

Python

replaced = re.sub("<p[^>]*>", "", yourstring)
  • <p matches the beginning of the tag
  • The negative character class [^>]* matches any character that is not a closing >
  • > closes the match
  • we replace all this with an empty string

Option 2: Also removing the closing </p> tags

PHP

$replaced = preg_replace('~</?p[^>]*>~', '', $yourstring);

JavaScript

replaced = yourstring.replace(/<\/?p[^>]*>/g, "");

Python

replaced = re.sub("</?p[^>]*>", "", yourstring)

Solution 2

This is a PCRE expression:

/<p( *\w+=("[^"]*"|'[^']'|[^ >]))*>(.*<\/p>)/Ug

Replace each occurrence with $3 or just remove all occurrences of:

/<p( *\w+=("[^"]*"|'[^']'|[^ >]))*>/g

If you want to remove the closing tag as well:

/<p( *\w+=("[^"]*"|'[^']'|[^ >]))*>(.*)<\/p>/Ug
Share:
17,684
n.nasa
Author by

n.nasa

Knows many things little little! Hopes to become an expert some day!

Updated on June 19, 2022

Comments

  • n.nasa
    n.nasa almost 2 years

    I have some HTML and the requirement is to remove only starting <p> tags from the string.

    Example:

    input: <p style="display:inline; margin: 40pt;"><span style="font:XXXX;"> Text1 Here</span></p><p style="margin: 50pt"><span style="font:XXXX">Text2 Here</span></p> <p style="display:inline; margin: 40pt;"><span style="font:XXXX;"> Text3 Here</span></p>the string goes on like that
    
    desired output: <span style="font:XXXX;"> Text1 Here</span></p><span style="font:XXXX">Text2 Here</span></p><span style="font:XXXX;"> Text3 Here</span></p>
    

    Is it possible using Regex? I have tried some combinations but not working. This is all a single string. Any advice appreciated.

  • Braj
    Braj almost 10 years
    what about closing </p> tag? I thing OP don't want it. Remove </p> from output as well. OP says :remove only starting tags from the string.
  • zx81
    zx81 almost 10 years
    @Braj As mentioned, look at his desired output. He is keeping the `</p> tags. :)
  • zx81
    zx81 almost 10 years
    FYI added a second option in case you also want to remove the closing `</p> tags.
  • zx81
    zx81 almost 10 years
    @Braj Okay, added an option for that... Cheers!
  • Braj
    Braj almost 10 years
    +1 Nice answer for considering all the languages and test cases.
  • zx81
    zx81 almost 10 years
    @Braj Thanks! Yes, it's quite funny that we don't know the specs. :)
  • zx81
    zx81 almost 10 years
    Hey there, following up on this, did this answer solve it, or is the problem still troubling you?
  • n.nasa
    n.nasa almost 10 years
    Sorry for the delay, I think am working on a different time-zone here! Okie, I am sorry to not have mentioned that this requirement has nothing to do with HTML or web. This is just the rich text that we are exporting from an application and I just need to manipulate that data, to import it in another application. That's why I can simply find and replace </p> tag with <br/> tags.
  • zx81
    zx81 almost 10 years
    Alright, that's cool. :) Does some of that code work for you?
  • n.nasa
    n.nasa almost 10 years
    @zx81 Thanks for your reply and answer. This is the correct answer. The mistake has been on my part, to not mention the specs clearly and tagging it in the wrong place. Apologies.