Regular expression to match all characters between <h1> tag

47,547

Solution 1

By default . matches every character except new line character.

In this case, you will need DOTALL option, which will make . matches any character, including new line character. DOTALL option can be specified inline as (?s). For example:

(?s)<h1>.+</h1>

However, you will see that it will not work, since the default behavior of the quantifier is greedy (in this case its +), which means that it will try to consume as many characters as possible. You will need to make it lazy (consume as few characters as possible) by adding extra ? after the quantifier +?:

(?s)<h1>.+?</h1>

Alternatively, the regex can be <h1>[^<>]*</h1>. In this case, you don't need to specify any option.

Solution 2

Since this question is the top Google results search for a regex trying to find all the characters between an h1 tag I thought I would give that answer as well. Since that was what I was looking for.

(?s)(?<=<h1>)(.+?)(?=</h1>)

That regex, if used on a sample text like <h1>A title</h1> <p>Some content</p> <h1>Another title</h1> will only return A title.

Share:
47,547

Related videos on Youtube

PrivateUser
Author by

PrivateUser

Updated on December 14, 2020

Comments

  • PrivateUser
    PrivateUser over 3 years

    I'm using sublime text 2 editor. I would like to use regex to match all character between all h1 tags.

    As of now i'm using like this

    <h1>.+</h1>
    

    Its working fine if the h1 tag doesn't have breaks.

    I mean for

    <h1>Hello this is a hedaer</h1>
    

    its working fine.

    But its not working if the tag look like this

    <h1>
       Hello this is a hedaer
    </h1>
    

    Can someone help me with the syntax?

  • nhahtdh
    nhahtdh over 11 years
    With OP's regex, specifying those options are not sufficient.
  • PrivateUser
    PrivateUser over 11 years
    @Some1.Kill.The.DJ I have tried your code. But its still not matching when the tag contain break
  • enrey
    enrey over 11 years
    Wouldn't that third regex break if you have any nested tags in h1? Like span or link or whatever... I just tried the "(?s)" and it works in sublime, that's cool.
  • Jay
    Jay over 11 years
    I never knew you could specify flags in regex searches in sublime - thanks for the information @Some1.Kill.The.DJ
  • Anirudha
    Anirudha over 11 years
    @enrey yes it would break..but even the 1st and 2nd regex can break if there is another h1 tag in h1 itself
  • giuseppe
    giuseppe over 10 years
    This also works to clear chars between consecutive tags like (?s)(?<=</h1>)(.+?)(?=<h1>)
  • Arete
    Arete about 7 years
    Not sure how you guys figure this stuff out. Where is the documentation on regex search like this?
  • David D.
    David D. over 6 years
    Works great, slight improvement to get all h tags: /<h[1-6]>[^<>]*<\/h[1-6]>/g (JS regex expression) BTW you can use jQuery $(el).text() to get just the text content.