Javascript RegExp match text between <a> tags

22,181

Solution 1

Try this:

/<a[^>]*>([\s\S]*?)<\/a>/
  • <a[^>]*> matches the opening a tag
  • ([\s\S]*?) matches any characters before the closing tag, as few as possible
  • <\/a> matches the closing tag

The ([\s\S]*?) captures the text between the tags as argument 1 in the array returned from an exec or match call.

This is really only good for finding text within a elements, it's not incredibly safe or reliable, but if you've got a big page of links and you just need their text, this will do it.


A much safer way to do this without RegExp would be:

function getAnchorTexts(htmlStr) {
    var div,
        anchors,
        i,
        texts;
    div = document.createElement('div');
    div.innerHTML = htmlStr;
    anchors = div.getElementsByTagName('a');
    texts = [];
    for (i = 0; i < anchors.length; i += 1) {
        texts.push(anchors[i].text);
    }
    return texts;
}

Solution 2

I don't have experience with Regex, but I think you can use JQuery with .text() !

JQuery API - .text()

I mean if you use :

var hrefText = $("a").text(); 

You will get your text without using Regex!

.find("a") and then gives you a list of a's tags objects and then use .each() to loop on that list then you can get the text by using .text().

Or your can use a class selector, id or anything you want!

Share:
22,181
Admin
Author by

Admin

Updated on August 08, 2020

Comments

  • Admin
    Admin over 3 years

    I need to match with a javascript RegExp the string: bimbo999 from this a tag: <a href="/game.php?village=828&amp;screen=info_player&amp;id=29956" >bimbo999</a>

    The numbers from URL vars (village and id) are changing every time so I have to match the numbers somehow with RegExp.

    </tr>
                        <tr><td>Sent</td><td >Oct 22, 2011  17:00:31</td></tr>
                                    <tr>
                            <td colspan="2" valign="top" height="160" style="border: solid 1px black; padding: 4px;">
                                <table width="100%">
        <tr><th width="60">Supported player:</th><th>
        <a href="/game.php?village=828&amp;screen=info_player&amp;id=29956" >bimbo999</a></th></tr>
        <tr><td>Village:</td><td><a href="/game.php?village=828&amp;screen=info_village&amp;id=848" >bimbo999s village (515|520) K55</a></td></tr>
        <tr><td>Origin of the troops:</td><td><a href="/game.php?village=828&amp;screen=info_village&amp;id=828" >KaLa I (514|520) K55</a></td></tr>
        </table><br />
    
        <h4>Units:</h4>
        <table class="vis">
    

    I tried with this:

    var match = h.match(/Supported player:</th>(.*)<\/a><\/th></i);
    

    but is not working. Can you guys, help me?

  • zzzzBov
    zzzzBov over 12 years
    this could also be done with regular javascript using getElementsByTagName('a'). Not a bad idea.
  • Ryan
    Ryan over 12 years
    As a side note, it's not a good idea to use regex to parse HTML :)
  • par
    par about 11 years
    /<a[^>]*>((?:.|\r?\n)*?)<\/a>/ is also handy for matching to the next closing tag over multiple lines.
  • Dominic
    Dominic almost 8 years
    It would match over multiple lines already \s match any white space character [\r\n\t\f ]