Get img src with PHP
Solution 1
Use a HTML parser like DOMDocument
and then evaluate the value you're looking for with DOMXpath
:
$html = '<img id="12" border="0" src="/images/image.jpg"
alt="Image" width="100" height="100" />';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg"
Or for those who really need to save space:
$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/@src)");
And for the one-liners out there:
$src = (string) reset(simplexml_import_dom(DOMDocument::loadHTML($html))->xpath("//img/@src"));
Solution 2
You would be better off using a DOM parser for this kind of HTML parsing. Consider this code:
$html = '<img id="12" border="0" src="/images/image.jpg"
alt="Image" width="100" height="100" />';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img"); // find your image
$node = $nodelist->item(0); // gets the 1st image
$value = $node->attributes->getNamedItem('src')->nodeValue;
echo "src=$value\n"; // prints src of image
OUTPUT:
src=/images/image.jpg
Solution 3
I have done that the more simple way, not as clean as it should be but it was a quick hack
$htmlContent = file_get_contents('pageURL');
// read all image tags into an array
preg_match_all('/<img[^>]+>/i',$htmlContent, $imgTags);
for ($i = 0; $i < count($imgTags[0]); $i++) {
// get the source string
preg_match('/src="([^"]+)/i',$imgTags[0][$i], $imgage);
// remove opening 'src=' tag, can`t get the regex right
$origImageSrc[] = str_ireplace( 'src="', '', $imgage[0]);
}
// will output all your img src's within the html string
print_r($origImageSrc);
Solution 4
I know people say you shouldn't use regular expressions to parse HTML, but in this case I find it perfectly fine.
$string = '<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />';
preg_match('/<img(.*)src(.*)=(.*)"(.*)"/U', $string, $result);
$foo = array_pop($result);
Solution 5
$imgTag = <<< LOB
<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />
<img border="0" src="/images/not_match_image.jpg" alt="Image" width="100" height="100" />
LOB;
preg_match('%<img.*?src=["\'](.*?)["\'].*?/>%i', $imgTag, $matches);
$imgSrc = $matches[1];
NOTE: You should use an HTML Parser like DOMDocument
and NOT a regex.
pangi
Updated on July 13, 2022Comments
-
pangi almost 2 years
I would like to get the SRC attribute into a variable in this example:
<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />
So for example - I would like to get a variable
$foo = "/images/image.jpg"
. Important! The src attribute will be dynamic, so it mustn't be hardcoded. Is there any quick and easy way to do this?Thanks!
EDIT: The image will be a part of a huge string that is basically the content of a news story. So the image is just a part of that.
EDIT2: There will be more images in this string, and I would only want to get the src of the first one. Is this possible?
-
gen_Eric about 12 yearsThe problem is that this regex is specific to this variable. What if you wanted to get the
src
from another image? -
kba about 12 years@Rocket The regex above is not specific to that variable. This will work with all (I believe)
img
tags that has asrc
attribute. -
Adri V. about 12 yearsit will fail if there's a space before or after the equal
<img src = "/images/image.jpg" />
-
gen_Eric about 12 years@AdrianaVillafañe: Isn't that not valid HTML anyway?
-
kba about 12 yearsFor more extensive HTML parsing, I completely agree, but for this it's simply overkill. Your code is longer, slower, and harder to read.
-
anubhava about 12 years@KristianAntonsen: How can you say this code is
slower
than regex? Do you have any benchmarking to support this behavior? -
kba about 12 years@AdrianaVillafañe Now it will match that as well.
-
Adri V. about 12 yearsThat's the point. Not every website "in the wild" has perfectly valid HTML. This code renders, and browsers show the image, and for many people that's all that matters (even if it's not valid) :
<html><body><img src = "http://blog.stackoverflow.com/wp-content/uploads/stackoverflow-logo-300.png" /></body></html>
-
kba about 12 years@AdrianaVillafañe As I said, I've updated the answer. It will now match.
-
Adri V. about 12 yearsI deleted my previous comment. But I now add two more: Case 1:
<img src=http://blog.stackoverflow.com/wp-content/uploads/stackoverflow-logo-300.png />
(src without quotes) | Case 2 :<img src='http://blog.stackoverflow.com/wp-content/uploads/stackoverflow-logo-300.png' />
(src surrounded with single quotes) -
kba about 12 years@anubhava I would say that it's both obvious and common sense. You're loading a heavy library and initializing objects. But since you asked, I made a small benchmark comparing our codes. 100,000 executions takes about 0,49 seconds with my code. It takes 6,2 seconds with your code.
-
Adri V. about 12 yearsCase 3 :
<img (line break here) src="/image.here">
-
kba about 12 yearsAll OP was asking was something to match his example using
"
s and no spaces. I know there is a reason why the DOM class is so much slower than a simple regex - one of these being it takes all these edge-cases into consideration, but it doesn't change the fact that sometimes the biggest tool isn't the best. -
hakre about 12 years@KristianAntonsen: That benchmark is cheating, because pcre caches compiled regexes per request. That means, it executes once really and 99,9999 times it fetches the precompiled result. You need to compare 100,000 requests against each other, not only function calls to come closer to reality. Microbenchmarking often can mislead with regexes.
-
kba about 12 yearsWith a single execution, it's still more than twice as fast. Either way, I don't see a reason to discuss this. If you find your code easier to read (or whatever quality parameter you use), stick to it, and I'll stick to mine.
-
pangi about 12 yearsWill this work if there are more images? So if I have 2 images, and I only want the src of the first one.
-
anubhava about 12 years@JernejPangeršič: Yes it will work for that case also since I'm using
$node = $nodelist->item(0);
which is getting very first image. -
Julien Royer over 11 yearsWhat if the HTML string contains an image within a comment? Using a real HTML parser is the only path to correctness here.
-
HamZa over 10 years
[jpg]{3}
will matchjpg
,jgp
,gjp
etc... The same goes for the rest -
Daniel Garcia Sanchez over 9 yearsGood answer. It helped me! :-)
-
Corgalore almost 9 yearsThis worked well for me on malformed html fragments.
-
jim smith almost 9 yearsThis seems to get one image. Anyway to get all the images in HTML?
-
hakre almost 9 years@jimsmith: Remove the string cast and the reset call and you have an array of all SRC attributes (as SimpleXMLElements).
-
dtanwar about 7 yearshow can use $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg" inside a for loop??
-
hakre about 7 years@dtanwar: By not using a single string() evaluation but by obtaining all the @src attribute nodes via a query:
$xpath->query('//img/@src')
. This retruns a query result you can loop over, see php.net/domxpath.query for an example and more detailed documentation. -
Ajay Singh about 4 yearsOr use one liner preg_match_all('/src\s{0,}=\s{0,}("|\')(.[^("|\')]*?)("|\')/i', $htmlContent, $imgarr); and use $imgarr[2]