How to get html tag attribute values using JavaScript Regular Expressions?
Solution 1
You were so close! All that needs to be done now is a simple loop:
var htmlString = '<meta http-equiv="Set-Cookie" content="COOKIE1_VALUE_HERE">\n'+
'<meta http-equiv="Set-Cookie" content="COOKIE2_VALUE_HERE">\n'+
'<meta http-equiv="Set-Cookie" content="COOKIE3_VALUE_HERE">\n';
var setCookieMetaRegExp = /<meta http-equiv=[\"']?set-cookie[\"']? content=[\"'](.*)[\"'].*>/ig;
var matches = [];
while (setCookieMetaRegExp.exec(htmlString)) {
matches.push(RegExp.$1);
}
//contains all cookie values
console.log(matches);
JSBIN: http://jsbin.com/OpepUjeW/1/edit?js,console
Solution 2
Keep it simple:
/content=\"(.*?)\">/gi
demo: http://regex101.com/r/dF9cD8
Update (based on your comment):
/<meta http-equiv=\"Set-Cookie\" content=\"(.*?)\">/gi
runs only on this exact string. Demo: http://regex101.com/r/pT0fC2
You really need the (.*?)
with the question mark, or the regex will keep going until the last >
it finds (or newline). The ?
makes the search stop at the first "
(you can change this to [\"']
if you want to match either single or double quote).
Solution 3
Try this
(?:class|href)([\s='"./]+)([\w-./?=&\\#"]+)((['#\\&?=/".\w\d]+|[\w)('-."\s]+)['"]|)
example :
function getTagAttribute(tag, attribute){
var regKey = '(?:' + attribute + ')([\\s=\'"./]+)([\\w-./?=\\#"]+)(([\'#\\&?=/".\\w\\d]+|[\\w)(\'-."\\s]+)[\'"]|)'
var regExp = new RegExp(regKey,'g');
var regResult = regExp.exec(tag);
if(regResult && regResult.length>0){
var splitKey = '(?:(' + attribute + ')+(|\\s)+([=])+(|\\s|[\'"])+)|(?:([\\s\'"]+)$)'
return regResult[0].replace(new RegExp(splitKey,'g'),'');
}else{
return '';
}
}
getTagAttribute('<a href = "./test.html#bir/deneme/?k=1&v=1" class= "xyz_bir-ahmet abc">','href');'
//return "./test.html#bir/deneme/?k=1&v=1"
Solution 4
no need for regular expressions just do some dom work
var head = document.createElement("head");
head.innerHTML = '<meta http-equiv="Set-Cookie" content="COOKIE1_VALUE_HERE"><meta http-equiv="Set-Cookie" content="COOKIE2_VALUE_HERE"><meta http-equiv="Set-Cookie" content="COOKIE3_VALUE_HERE">';
var metaNodes = head.childNodes;
for(var i=0; i<metaNodes.length; i++){
var contentValue = metaNodes[i].attributes.getNamedItem("content").value;
}
As you are using nodejs and BlackSheep mentions using cheerio
you could use their syntax if you wish to use that lib:
//Assume htmlString contains the html
var cheerio = require('cheerio'),
$ = cheerio.load(htmlString);
var values=[];
$("meta").each(function(i, elem) {
values[i] = $(this).attr("content");
});
Francesco Mangia
Updated on June 13, 2022Comments
-
Francesco Mangia almost 2 years
Suppose I have this HTML in a string:
<meta http-equiv="Set-Cookie" content="COOKIE1_VALUE_HERE"> <meta http-equiv="Set-Cookie" content="COOKIE2_VALUE_HERE"> <meta http-equiv="Set-Cookie" content="COOKIE3_VALUE_HERE">
And I have this regular expression, to get the values inside the
content
attributes:/<meta http-equiv=[\"']?set-cookie[\"']? content=[\"'](.*)[\"'].*>/ig
How do I, in JavaScript, get all three
content
values?I've tried:
var setCookieMetaRegExp = /<meta http-equiv=[\"']?set-cookie[\"']? content=[\"'](.*)[\"'].*>/ig; var match = setCookieMetaRegExp.exec(htmlstring);
but
match
doesn't contain the values I need. Help?Note: the regular expression is already correct (see here). I just need to match it to the string. Note: I'm using NodeJS
-
Patrick Evans over 10 years@Obay you might want to mention that you are using NodeJS in your question then lol
-
Francesco Mangia over 10 yearsI need to run the regular expression specifically on set-cookie, and the HTML string is a complete HTML document
-
Francesco Mangia over 10 yearsIt doesn't work, even when changing the parameter of
exec()
intohtmlstring
-
Ram over 10 years@Obay Why don't you use cheerio lib?
-
Francesco Mangia over 10 yearsSorry about that! Will modify :P
-
Francesco Mangia over 10 yearsI tried the modified code, it doesn't work. Uncaught TypeError
-
Francesco Mangia over 10 years@BlackSheep I will look into it
-
Patrick Evans over 10 years@Obay edited to include snippet on how to do it with cheerio lib since BlackSheep mentions that lib.
-
Patrick Evans over 10 yearsuse single quotes around the string, otherwise you will get errors due to double quotes being in the string.
-
RobG over 10 yearsWhile the OP may not relate to browsers, it's worth noting for this answer that not all browsers allow setting of the innerHTML property of head elements (e.g. IE).
-
Adam Youngers over 9 yearsSo this only seems to work when content comes at the end of the tag. If an attribute follows content then it picks it up in the regexp. How to do you tell the matching to stop when it reaches the closing quote? Here is my bin of the problem... jsbin.com/xebimu/1/edit?js,console