Get img src with PHP

134,081

Solution 1

Use a HTML parser like DOMDocument and then evaluate the value you're looking for with DOMXpath:

$html = '<img id="12" border="0" src="/images/image.jpg"
         alt="Image" width="100" height="100" />';

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg"

Or for those who really need to save space:

$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/@src)");

And for the one-liners out there:

$src = (string) reset(simplexml_import_dom(DOMDocument::loadHTML($html))->xpath("//img/@src"));

Solution 2

You would be better off using a DOM parser for this kind of HTML parsing. Consider this code:

$html = '<img id="12" border="0" src="/images/image.jpg"
         alt="Image" width="100" height="100" />';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img"); // find your image
$node = $nodelist->item(0); // gets the 1st image
$value = $node->attributes->getNamedItem('src')->nodeValue;
echo "src=$value\n"; // prints src of image

OUTPUT:

src=/images/image.jpg

Solution 3

I have done that the more simple way, not as clean as it should be but it was a quick hack

$htmlContent = file_get_contents('pageURL');

// read all image tags into an array
preg_match_all('/<img[^>]+>/i',$htmlContent, $imgTags); 

for ($i = 0; $i < count($imgTags[0]); $i++) {
  // get the source string
  preg_match('/src="([^"]+)/i',$imgTags[0][$i], $imgage);

  // remove opening 'src=' tag, can`t get the regex right
  $origImageSrc[] = str_ireplace( 'src="', '',  $imgage[0]);
}
// will output all your img src's within the html string
print_r($origImageSrc);

Solution 4

I know people say you shouldn't use regular expressions to parse HTML, but in this case I find it perfectly fine.

$string = '<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />';
preg_match('/<img(.*)src(.*)=(.*)"(.*)"/U', $string, $result);
$foo = array_pop($result);

Solution 5

$imgTag = <<< LOB
<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />
<img border="0" src="/images/not_match_image.jpg" alt="Image" width="100" height="100" />
LOB;

preg_match('%<img.*?src=["\'](.*?)["\'].*?/>%i', $imgTag, $matches);
$imgSrc = $matches[1];

DEMO


NOTE: You should use an HTML Parser like DOMDocument and NOT a regex.

Share:
134,081
pangi
Author by

pangi

Updated on July 13, 2022

Comments

  • pangi
    pangi almost 2 years

    I would like to get the SRC attribute into a variable in this example:

    <img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />
    

    So for example - I would like to get a variable $foo = "/images/image.jpg". Important! The src attribute will be dynamic, so it mustn't be hardcoded. Is there any quick and easy way to do this?

    Thanks!

    EDIT: The image will be a part of a huge string that is basically the content of a news story. So the image is just a part of that.

    EDIT2: There will be more images in this string, and I would only want to get the src of the first one. Is this possible?

  • gen_Eric
    gen_Eric about 12 years
    The problem is that this regex is specific to this variable. What if you wanted to get the src from another image?
  • kba
    kba about 12 years
    @Rocket The regex above is not specific to that variable. This will work with all (I believe) img tags that has a src attribute.
  • Adri V.
    Adri V. about 12 years
    it will fail if there's a space before or after the equal <img src = "/images/image.jpg" />
  • gen_Eric
    gen_Eric about 12 years
    @AdrianaVillafañe: Isn't that not valid HTML anyway?
  • kba
    kba about 12 years
    For more extensive HTML parsing, I completely agree, but for this it's simply overkill. Your code is longer, slower, and harder to read.
  • anubhava
    anubhava about 12 years
    @KristianAntonsen: How can you say this code is slower than regex? Do you have any benchmarking to support this behavior?
  • kba
    kba about 12 years
    @AdrianaVillafañe Now it will match that as well.
  • Adri V.
    Adri V. about 12 years
    That's the point. Not every website "in the wild" has perfectly valid HTML. This code renders, and browsers show the image, and for many people that's all that matters (even if it's not valid) : <html><body><img src = "http://blog.stackoverflow.com/wp-content/uploads/stackoverf‌​low-logo-300.png" /></body></html>
  • kba
    kba about 12 years
    @AdrianaVillafañe As I said, I've updated the answer. It will now match.
  • Adri V.
    Adri V. about 12 years
    I deleted my previous comment. But I now add two more: Case 1: <img src=http://blog.stackoverflow.com/wp-content/uploads/stackov‌​erflow-logo-300.png /> (src without quotes) | Case 2 : <img src='http://blog.stackoverflow.com/wp-content/uploads/stacko‌​verflow-logo-300.png‌​' /> (src surrounded with single quotes)
  • kba
    kba about 12 years
    @anubhava I would say that it's both obvious and common sense. You're loading a heavy library and initializing objects. But since you asked, I made a small benchmark comparing our codes. 100,000 executions takes about 0,49 seconds with my code. It takes 6,2 seconds with your code.
  • Adri V.
    Adri V. about 12 years
    Case 3 : <img (line break here) src="/image.here">
  • kba
    kba about 12 years
    All OP was asking was something to match his example using "s and no spaces. I know there is a reason why the DOM class is so much slower than a simple regex - one of these being it takes all these edge-cases into consideration, but it doesn't change the fact that sometimes the biggest tool isn't the best.
  • hakre
    hakre about 12 years
    @KristianAntonsen: That benchmark is cheating, because pcre caches compiled regexes per request. That means, it executes once really and 99,9999 times it fetches the precompiled result. You need to compare 100,000 requests against each other, not only function calls to come closer to reality. Microbenchmarking often can mislead with regexes.
  • kba
    kba about 12 years
    With a single execution, it's still more than twice as fast. Either way, I don't see a reason to discuss this. If you find your code easier to read (or whatever quality parameter you use), stick to it, and I'll stick to mine.
  • pangi
    pangi about 12 years
    Will this work if there are more images? So if I have 2 images, and I only want the src of the first one.
  • anubhava
    anubhava about 12 years
    @JernejPangeršič: Yes it will work for that case also since I'm using $node = $nodelist->item(0); which is getting very first image.
  • Julien Royer
    Julien Royer over 11 years
    What if the HTML string contains an image within a comment? Using a real HTML parser is the only path to correctness here.
  • HamZa
    HamZa over 10 years
    [jpg]{3} will match jpg, jgp, gjp etc... The same goes for the rest
  • Daniel Garcia Sanchez
    Daniel Garcia Sanchez over 9 years
    Good answer. It helped me! :-)
  • Corgalore
    Corgalore almost 9 years
    This worked well for me on malformed html fragments.
  • jim smith
    jim smith almost 9 years
    This seems to get one image. Anyway to get all the images in HTML?
  • hakre
    hakre almost 9 years
    @jimsmith: Remove the string cast and the reset call and you have an array of all SRC attributes (as SimpleXMLElements).
  • dtanwar
    dtanwar about 7 years
    how can use $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg" inside a for loop??
  • hakre
    hakre about 7 years
    @dtanwar: By not using a single string() evaluation but by obtaining all the @src attribute nodes via a query: $xpath->query('//img/@src'). This retruns a query result you can loop over, see php.net/domxpath.query for an example and more detailed documentation.
  • Ajay Singh
    Ajay Singh about 4 years
    Or use one liner preg_match_all('/src\s{0,}=\s{0,}("|\')(.[^("|\')]*?)("|\')/‌​i', $htmlContent, $imgarr); and use $imgarr[2]