Regular expression for anchor tag with all attributes

31,428

Solution 1

/<a[^>]*>([^<]+)<\/a>/g

It's far from being perfect, but you need to provide more examples of what is a correct match and what isn't (e.g. what about whitespaces?)

Solution 2

/<a[\s]+([^>]+)>((?:.(?!\<\/a\>))*.)<\/a>/g

This one will match any <a ...>...</a> tag including correctly matching ones that contain a < or any full tags such as:

blah blah <a href="test.html">This line contains an HTML opening < bracket.</a> blah blah
blah blah <a href="test.html">This line contains <strong>bold</strong> text.</a> blah blah

Would capture:

<a href="test.html">This line contains an HTML opening < bracket.</a>
  • with capture groups:
    • href="test.html"
    • This line contains an HTML opening < bracket.

and

<a href="test.html">This line contains <strong>bold</strong> text.</a>
  • with capture groups:
    • href="test.html"
    • This line contains <strong>bold</strong> text.

It also includes capturing groups for the tag attributes (like class="", href="", etc) and contain (what is between the tag) that can be removed if you do not need them.

If you want to capture across multiple lines add an "s" before or after the "g" flag at the end. Note that the "s" flag may not work in all flavors of regular expression.

Capture example (not using the "s" flag - not supported by regexr yet): http://regexr.com/39rsv

Solution 3

Just a little correction from the accepted answer. This is the correct regex: /<a[^>]*>([^<]+)<\/a>/g. The forward slash (/) for closing the anchor tag </a> was not escaped so no match will be made.

Share:
31,428
Lobo
Author by

Lobo

Software Engineer

Updated on July 09, 2022

Comments

  • Lobo
    Lobo almost 2 years

    I'm trying to get a regular expression to replace all the links out of a text string for the value of the link.

    A link may look like these:

    <a href="http://whatever" id="an_id" rel="a_rel">the link</a>
    <a href="/absolute_url/whatever" id="an_id" rel="a_rel">the link</a>
    

    I want a regular expression that I get: the link

  • Lobo
    Lobo about 12 years
    Hi Florian, others example: <a href="/absolute_url/whatever" id="an_id" rel="a_rel"></a> <a href="/absolute_url/whatever">a link</a> <a href="domain.com">a link</a>
  • Brian Leishman
    Brian Leishman over 8 years
    You have an unescaped forward slash near the end
  • Jerry
    Jerry about 7 years
    how would you modify this to cover bla bla <a href="test.html" data-annoying=">" >yikes</a>? That's the one killing me right now.
  • Kshitij
    Kshitij over 6 years
    Note: This would not work for nested elements. Regex should be case insensitive as <a> and <A> both are valid.
  • idungotnosn
    idungotnosn over 5 years
    Good question, @Jerry. I don't really know how to answer your question (and this post is over a year too late), but I would think that any HTML attributes that contain XML special characters like that should have those characters encoded somehow.
  • robrecord
    robrecord about 2 years
    Escaped < and > where it shouldn't be... correct version is <a[\s]+([^>]+)>((?:.(?!<\/a>))*.)<\/a>