Regex that extracts text between tags, but not the tags

php regex preg-match preg-match-all

28,241

Solution 1

You can use this following Regex:

>([^<]*)<

or, >[^<]*<

Then eliminate unwanted characters like '<' & '>'

Solution 2

the best way is to use Assertions, for your case, the regex would be:

(?<=\<title\>).*?(?=\<\/title\>)

for more details have a look here

Solution 3

In your case, you could just use the second backreference from the regex, which would hold the text you are interested in.

Since you mention preg_match in your tags, I am assuming you want this for PHP.

$matches = array();
$pattern = '#<title>(.*?)</title>#'; // note I changed the pattern a bit
preg_match($pattern, $string, $matches);
$title = $matches[1];

Note that this is actually the first back reference in my patterns, since I've omitted the parentheses around the tags themselves, which were not needed.

Typically, you should not use Regex to parse HTML documents, but I think this might be one of those exception cases, where it is not so bad, since the title tag should only exist once on the page.

Solution 4

I used this as a replace function of Regex: (<.+?>)

View more solutions

28,241

Nicolaesse

Updated on April 11, 2020

Comments

Nicolaesse about 4 years
I want to write a regex which extract the content that is between two tags <title> in a string but not the tags. IE I have the following
```
<title>My work</title>
<p>This is my work.</p> <p>Learning regex.</p>
```
The regex
```
(<title>)(.*?)(<\/title>)
```
extracts <title>My work</title> but I want to extract only My work. How can I do that? This is a link to the example http://regex101.com/r/mD8fB0
ZOXEXIVO over 9 years

this not work if newline character present in content!
Amit Choukroun almost 9 years

can you explain the meaning of [^<] ?
PeterX about 8 years

This doesn't work with <charlie><bob>Alice</bob></charlie> - i.e. text inside nested tags. Any thoughts?
Eric Novins over 7 years

SMART answer! I like the way you're looking at things