JS RegExp to replace < and > inside element attributes

10,620

Solution 1

You can do this with a while loop, that checks if there are still tags to replace:

var htmlString = '<div id="&lt;lol&gt;"><span title="&lt;&gt;&lt; &lt;&gt;&lt; &lt;&gt;&lt; fish">hover for fishies</span></div>';
while (htmlString.match(/="([^"]*)\&[gl]t;([^"]*)"/g)) {
    htmlString = htmlString.replace(/="([^"]*)\&gt;([^"]*)"/g, '="$1>$2"')
        .replace(/="([^"]*)\&lt;([^"]*)"/g, '="$1<$2"');
}

This loop will keep going until there are no &gt; or &lt; matches left in the HTML string.

The reason this can't be done in a single regex replace (or at least, as far as I know), is because you will need to match every &lt; or &gt; between the =" and ". With regex, that would mean you'd have to do something like /="([^"]*)(\&[lg]t;([^"]*))*"/g to match all of them, but that would mean you can't retrieve the capturing group anymore, which then would make replacing it with something impossible.

You can however also do this with a callback function on your replace:

var htmlString = '<div id="&lt;lol&gt;"><span title="&lt;&gt;&lt; &lt;&gt;&lt; &lt;&gt;&lt; fish">hover for fishies</span></div>';
htmlString = htmlString.replace(/="[^"]*\&[gl]t;[^"]*"/g, function(match) {
   return match.replace(/\&gt;/g, '>').replace(/\&lt;/g, '<'); 
});

That will first match every attribute that has either &lt; or &gt; in it, and then perform a replace on the matched part of the string.

Solution 2

string.replace(/="[^"]+"/g,function($0){return $0.replace(/&lt;/g,"<").replace(/&gt;/g,">");})

What this line does:

  • within the string, search for text that starts with =" and ends with "
  • within this text: replace all &lt; with <
  • within this text: replace all &gt; with >

In the function, $0 represents the matching string ="[^"]+".

Visit this page for more details on string replace.

Share:
10,620

Related videos on Youtube

bearfriend
Author by

bearfriend

Updated on June 04, 2022

Comments

  • bearfriend
    bearfriend almost 2 years

    I'm looking to replace &lt; and &gt; with < and and > inside html element attributes, or in other words between =" and ".

    I attempted this myself but I'm just not matching anything. A breakdown of the regexp would be nice too, so I can attempt to understand it and eventually write these on my own.

    • bearfriend
      bearfriend over 10 years
      Actually anywhere between <*=" and "*> would be even better.
    • Álvaro González
      Álvaro González over 10 years
      Are you 100% sure that's your real data? That would imply double-encoding when generating HTML, which is a problem by itself.
    • bearfriend
      bearfriend over 10 years
      Unfortunately, there is existing html in the codebase that looks like this: <a href="<!-- relpace-me-with-something -->"></a>, and these templates get run through a new pre-processor in phantomjs, which does some DOM manipulation and write the new innerHTML (which encodes the carets) to a new template file. So, yes, it's correct.
    • Joeytje50
      Joeytje50 over 10 years
      @dg988 I've added an alternative method to do this in my answer. You might want to check it out to see if that's what you were looking for.
    • bearfriend
      bearfriend over 10 years
      That looks like what I want!
    • Joeytje50
      Joeytje50 over 10 years
      @dg988 If this was the answer you were looking for, you can mark it as accepted to mark this question as answered.
    • bearfriend
      bearfriend over 10 years
      Just testing it out first. If it works, I will accept.
  • bearfriend
    bearfriend over 10 years
    I like this idea, but I was worried that it might be pretty intensive compared to a regexp. Anyone have thoughts on that? I would have to call this recursively on every element in the template, some of which are quite large.