Regular expression to match all characters between <h1> tag
Solution 1
By default .
matches every character except new line character.
In this case, you will need DOTALL option, which will make .
matches any character, including new line character. DOTALL option can be specified inline as (?s)
. For example:
(?s)<h1>.+</h1>
However, you will see that it will not work, since the default behavior of the quantifier is greedy (in this case its +
), which means that it will try to consume as many characters as possible. You will need to make it lazy (consume as few characters as possible) by adding extra ?
after the quantifier +?
:
(?s)<h1>.+?</h1>
Alternatively, the regex can be <h1>[^<>]*</h1>
. In this case, you don't need to specify any option.
Solution 2
Since this question is the top Google results search for a regex trying to find all the characters between an h1 tag I thought I would give that answer as well. Since that was what I was looking for.
(?s)(?<=<h1>)(.+?)(?=</h1>)
That regex, if used on a sample text like <h1>A title</h1> <p>Some content</p> <h1>Another title</h1>
will only return A title
.
Related videos on Youtube
PrivateUser
Updated on December 14, 2020Comments
-
PrivateUser over 3 years
I'm using sublime text 2 editor. I would like to use regex to match all character between all
h1
tags.As of now i'm using like this
<h1>.+</h1>
Its working fine if the h1 tag doesn't have breaks.
I mean for
<h1>Hello this is a hedaer</h1>
its working fine.
But its not working if the tag look like this
<h1> Hello this is a hedaer </h1>
Can someone help me with the syntax?
-
nhahtdh over 11 yearsWith OP's regex, specifying those options are not sufficient.
-
PrivateUser over 11 years@Some1.Kill.The.DJ I have tried your code. But its still not matching when the tag contain break
-
enrey over 11 yearsWouldn't that third regex break if you have any nested tags in h1? Like span or link or whatever... I just tried the "(?s)" and it works in sublime, that's cool.
-
Jay over 11 yearsI never knew you could specify flags in regex searches in sublime - thanks for the information @Some1.Kill.The.DJ
-
Anirudha over 11 years@enrey yes it would break..but even the 1st and 2nd regex can break if there is another h1 tag in h1 itself
-
giuseppe over 10 yearsThis also works to clear chars between consecutive tags like (?s)(?<=</h1>)(.+?)(?=<h1>)
-
Arete about 7 yearsNot sure how you guys figure this stuff out. Where is the documentation on regex search like this?
-
David D. over 6 yearsWorks great, slight improvement to get all h tags: /<h[1-6]>[^<>]*<\/h[1-6]>/g (JS regex expression) BTW you can use jQuery $(el).text() to get just the text content.