Extract image src from a string

46,773

Solution 1

You need to use a capture group () to extract the urls, and if you're wanting to match globally g, i.e. more than once, when using capture groups, you need to use exec in a loop (match ignores capture groups when matching globally).

For example

var m,
    urls = [], 
    str = '<img src="http://site.org/one.jpg />\n <img src="http://site.org/two.jpg />',
    rex = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;

while ( m = rex.exec( str ) ) {
    urls.push( m[1] );
}

console.log( urls ); 
// [ "http://site.org/one.jpg", "http://site.org/two.jpg" ]

Solution 2

var myRegex = /<img[^>]+src="(http:\/\/[^">]+)"/g;
var test = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />';
myRegex.exec(test);

Solution 3

As Mathletics mentioned in a comment, there are other more straightforward ways to retrieve the src attribute from your <img> tags such as retrieving a reference to the DOM node via id, name, class, etc. and then just using your reference to extract the information you need. If you need to do this for all of your <img> elements, you can do something like this:

var imageTags = document.getElementsByTagName("img"); // Returns array of <img> DOM nodes
var sources = [];
for (var i in imageTags) {
   var src = imageTags[i].src;
   sources.push(src);
}

However, if you have some restriction forcing you to use regex, then the other answers provided will work just fine.

Solution 4

Perhaps this is what you are looking for:

What I did is slightly modified your regex then used the exec function to get array of matched strings. if you have more then 1 match the other matches will be on results[2], results[3]...

var html = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />';

var re = /<img[^>]+src="http:\/\/([^">]+)/g
var results = re.exec(html);

var source = results[1];
alert(source);

Solution 5

You can use an html parser and avoid regexp at all.

var parser = require('node-html-parser');

var html = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />'

parser.parse(html).querySelector('img').getAttribute('src')

=> 'http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg'
Share:
46,773
Default
Author by

Default

Software Engineering Manager @ Variphy Inc.

Updated on September 01, 2021

Comments

  • Default
    Default over 2 years

    I'm trying to match all the images elements as strings,

    This is my regex:

    html.match(/<img[^>]+src="http([^">]+)/g);
    

    This works, but I want to extract the src of all the images. So when I execute the regular expression on this String:

    <img src="http://static2.ccn.com/ccs/2013/02/img_example.jpg />

    it returns:

    "http://static2.ccn.com/ccs/2013/02/img_example.jpg"

    • Mathletics
      Mathletics over 11 years
      Don't use regex to parse html.
    • Admin
      Admin over 11 years
      I have to do with regex
    • Admin
      Admin over 11 years
      @Tomirammstein, why do you have to do it with a regex when Javascript has DOM built in?
    • Šime Vidas
      Šime Vidas over 11 years
      @Tomirammstein In which environment is your JavaScript code executing? If it's a web-browsers, just parse the HTML string into a DOM tree.
    • sdespont
      sdespont over 11 years
      Too bad... with JQuery it would be $('img[src="http://static2.ccn.com/ccs"]').each(function(){}‌​);
    • Šime Vidas
      Šime Vidas over 11 years
      @dan1111 Not exactly. JavaScript is just a scripting language. The DOM is not built-in in web-browsers, not JavaScript.
    • Admin
      Admin over 11 years
      I'm using node.js, so, I can't parse it into an HTML tree
    • VoronoiPotato
      VoronoiPotato over 11 years
      github.com/harryf/node-soupselect maybe this could help
    • Šime Vidas
      Šime Vidas over 11 years
    • Ian
      Ian over 11 years
      @Tomirammstein Don't you think it would've been helpful to tag this question as node.js in the first place?
    • Admin
      Admin over 11 years
      Don't you think that node.js it's based on Javscript?
    • Ian
      Ian over 11 years
      Yes but they aren't the same. You said it yourself, node.js is based on Javascript - it doesn't include everything and isn't perfectly identical. I'm just saying, tagging it correctly and explaining it better could've helped get a more direct and correct solution, faster.
    • appsntech
      appsntech over 4 years
      this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Anyone can you help ? stackoverflow.com/questions/57883657/…
  • juminoz
    juminoz about 11 years
    Ended up with this instead. Otherwise, it doesn't pick up all images. /<img[^>]+src="([^">]+)/g
  • Saidulu Buchhala
    Saidulu Buchhala over 10 years
    some times img tag may have height or some other attr after "src" attr.So regex should be rex = /<img[^>]+src="?([^"\s]+)"?[^>]*\/>/g;
  • norman784
    norman784 over 9 years
    seems that this regex not works on all img tags, but this works /<img.*?src="([^">]*\/([^">]*?))".*?>/g;
  • akelec
    akelec about 8 years
    Thank you for your answer. It helped me. I just want to add this: var src = myRegex.exec(test); console.log('SRC: ' + src[1]);
  • appsntech
    appsntech over 4 years
    this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Can you help ? stackoverflow.com/questions/57883657/…
  • appsntech
    appsntech over 4 years
    this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Can you help ? stackoverflow.com/questions/57883657/…
  • Admin
    Admin over 2 years
    Please provide additional details in your answer. As it's currently written, it's hard to understand your solution.