How to get html tag attribute values using JavaScript Regular Expressions?

12,717

Solution 1

You were so close! All that needs to be done now is a simple loop:

var htmlString = '<meta http-equiv="Set-Cookie" content="COOKIE1_VALUE_HERE">\n'+
'<meta http-equiv="Set-Cookie" content="COOKIE2_VALUE_HERE">\n'+
'<meta http-equiv="Set-Cookie" content="COOKIE3_VALUE_HERE">\n';

var setCookieMetaRegExp = /<meta http-equiv=[\"']?set-cookie[\"']? content=[\"'](.*)[\"'].*>/ig;

var matches = [];
while (setCookieMetaRegExp.exec(htmlString)) {
  matches.push(RegExp.$1);
}

//contains all cookie values
console.log(matches);

JSBIN: http://jsbin.com/OpepUjeW/1/edit?js,console

Solution 2

Keep it simple:

/content=\"(.*?)\">/gi

demo: http://regex101.com/r/dF9cD8

Update (based on your comment):

/<meta http-equiv=\"Set-Cookie\" content=\"(.*?)\">/gi

runs only on this exact string. Demo: http://regex101.com/r/pT0fC2

You really need the (.*?) with the question mark, or the regex will keep going until the last > it finds (or newline). The ? makes the search stop at the first " (you can change this to [\"'] if you want to match either single or double quote).

Solution 3

Try this

(?:class|href)([\s='"./]+)([\w-./?=&\\#"]+)((['#\\&?=/".\w\d]+|[\w)('-."\s]+)['"]|)

example :

function getTagAttribute(tag, attribute){    
    var regKey = '(?:' + attribute + ')([\\s=\'"./]+)([\\w-./?=\\#"]+)(([\'#\\&?=/".\\w\\d]+|[\\w)(\'-."\\s]+)[\'"]|)'
    var regExp = new RegExp(regKey,'g');
    var regResult = regExp.exec(tag);   
    if(regResult && regResult.length>0){                        
        var splitKey = '(?:(' + attribute + ')+(|\\s)+([=])+(|\\s|[\'"])+)|(?:([\\s\'"]+)$)'                
        return regResult[0].replace(new RegExp(splitKey,'g'),'');
    }else{
        return '';
    }
}


getTagAttribute('<a href  =   "./test.html#bir/deneme/?k=1&v=1"    class=   "xyz_bir-ahmet abc">','href');'

//return  "./test.html#bir/deneme/?k=1&v=1"

Live Regexp101

Live JS Script Example

Solution 4

no need for regular expressions just do some dom work

var head = document.createElement("head");
head.innerHTML = '<meta http-equiv="Set-Cookie" content="COOKIE1_VALUE_HERE"><meta http-equiv="Set-Cookie" content="COOKIE2_VALUE_HERE"><meta http-equiv="Set-Cookie" content="COOKIE3_VALUE_HERE">';

var metaNodes = head.childNodes;
for(var i=0; i<metaNodes.length; i++){
   var contentValue = metaNodes[i].attributes.getNamedItem("content").value;
}

As you are using nodejs and BlackSheep mentions using cheerio you could use their syntax if you wish to use that lib:

//Assume htmlString contains the html
var cheerio = require('cheerio'),
$ = cheerio.load(htmlString);
var values=[];
$("meta").each(function(i, elem) {
  values[i] = $(this).attr("content");
});
Share:
12,717
Francesco Mangia
Author by

Francesco Mangia

Updated on June 13, 2022

Comments

  • Francesco Mangia
    Francesco Mangia almost 2 years

    Suppose I have this HTML in a string:

    <meta http-equiv="Set-Cookie" content="COOKIE1_VALUE_HERE">
    <meta http-equiv="Set-Cookie" content="COOKIE2_VALUE_HERE">
    <meta http-equiv="Set-Cookie" content="COOKIE3_VALUE_HERE">
    

    And I have this regular expression, to get the values inside the content attributes:

    /<meta http-equiv=[\"']?set-cookie[\"']? content=[\"'](.*)[\"'].*>/ig
    

    How do I, in JavaScript, get all three content values?

    I've tried:

    var setCookieMetaRegExp = /<meta http-equiv=[\"']?set-cookie[\"']? content=[\"'](.*)[\"'].*>/ig;
    var match = setCookieMetaRegExp.exec(htmlstring);
    

    but match doesn't contain the values I need. Help?

    Note: the regular expression is already correct (see here). I just need to match it to the string. Note: I'm using NodeJS

  • Patrick Evans
    Patrick Evans over 10 years
    @Obay you might want to mention that you are using NodeJS in your question then lol
  • Francesco Mangia
    Francesco Mangia over 10 years
    I need to run the regular expression specifically on set-cookie, and the HTML string is a complete HTML document
  • Francesco Mangia
    Francesco Mangia over 10 years
    It doesn't work, even when changing the parameter of exec() into htmlstring
  • Ram
    Ram over 10 years
    @Obay Why don't you use cheerio lib?
  • Francesco Mangia
    Francesco Mangia over 10 years
    Sorry about that! Will modify :P
  • Francesco Mangia
    Francesco Mangia over 10 years
    I tried the modified code, it doesn't work. Uncaught TypeError
  • Francesco Mangia
    Francesco Mangia over 10 years
    @BlackSheep I will look into it
  • Patrick Evans
    Patrick Evans over 10 years
    @Obay edited to include snippet on how to do it with cheerio lib since BlackSheep mentions that lib.
  • Patrick Evans
    Patrick Evans over 10 years
    use single quotes around the string, otherwise you will get errors due to double quotes being in the string.
  • RobG
    RobG over 10 years
    While the OP may not relate to browsers, it's worth noting for this answer that not all browsers allow setting of the innerHTML property of head elements (e.g. IE).
  • Adam Youngers
    Adam Youngers over 9 years
    So this only seems to work when content comes at the end of the tag. If an attribute follows content then it picks it up in the regexp. How to do you tell the matching to stop when it reaches the closing quote? Here is my bin of the problem... jsbin.com/xebimu/1/edit?js,console