PHP preg_match_all regex to extract only number in string
Solution 1
I think this is the best approach:
- Use an HTML parser to extract the image tags
- Use a regular expression (or perhaps string manipulation) to extract the ID
- Query for the data
- Use the HTML parser to insert the returned data
Here is an example. There are improvements I can think of, such as using string manipulation instead of a regex.
$html = '<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />';
$doc = new DOMDocument;
$doc->loadHtml( $html);
foreach( $doc->getElementsByTagName('img') as $img)
{
$src = $img->getAttribute('src');
preg_match( '#/images/([0-9]+)\.#i', $src, $matches);
$id = $matches[1];
echo 'Fetching info for image ID ' . $id . "\n";
// Query stuff here
$result = 'Got this from the DB';
$img->setAttribute( 'title', $result);
$img->setAttribute( 'alt', $result);
}
$newHTML = $doc->saveHtml();
Solution 2
Using regular expressions, you can get the number really easily. The third argument for preg_match_all is a by-reference array that will be populated with the matches that were found.
preg_match_all('/<img src="http:\/\/domain.com\/images\/(\d+)\.[a-zA-Z]+"/', $html, $matches);
print_r($matches);
This would contain all of the stuff that it found.
Solution 3
use preg_match_all:
preg_match_all('#<img.*?/(\d+)\.#', $str, $m);
print_r($m);
output:
Array
(
[0] => Array
(
[0] => <img src="http://domain.com/images/59.
[1] => <img src="http://domain.com/images/549.
[2] => <img src="http://domain.com/images/1249.
[3] => <img src="http://domain.com/images/6.
)
[1] => Array
(
[0] => 59
[1] => 549
[2] => 1249
[3] => 6
)
)
Solution 4
Consider using preg_replace_callback
.
Use this regex: (images/([0-9]+)[^"]+")
Then, as the callback
argument, use an anonymous function. Result:
$output = preg_replace_callback(
"(images/([0-9]+)[^\"]+\")",
function($m) {
// $m[1] is the number.
$t = getTitleFromDatabase($m[1]); // do whatever you have to do to get the title
return $m[0]." title=\"".$t."\"";
},
$input
);
Comments
-
Surajit Jati almost 2 years
I can't seem to figure out the proper regular expression for extracting just specific numbers from a string. I have an HTML string that has various img tags in it. There are a bunch of img tags in the HTML that I want to extract a portion of the value from. They follow this format:
<img src="http://domain.com/images/59.jpg" class="something" /> <img src="http://domain.com/images/549.jpg" class="something" /> <img src="http://domain.com/images/1249.jpg" class="something" /> <img src="http://domain.com/images/6.jpg" class="something" />
So, varying lengths of numbers before what 'usually' is a .jpg (it may be a .gif, .png, or something else too). I want to only extract the number from that string.
The 2nd part of this is that I want to use that number to look up an entry in a database and grab the alt/title tag for that specific id of image. Lastly, I want to add that returned database value into the string and throw it back into the HTML string.
Any thoughts on how to proceed with it would be great...
Thus far, I've tried:
$pattern = '/img src="http://domain.com/images/[0-9]+\/.jpg'; preg_match_all($pattern, $body, $matches); var_dump($matches);
-
jordanm about 12 yearsYou just need to use a capture group. What have you tried?
-
Surajit Jati about 12 yearspost edited with what I've tried thus far
-
-
Surajit Jati about 12 yearsthat captures every number in the string, not just in <img> tags
-
Surajit Jati about 12 yearsI love this approach, but how should I deal with the warnings of malformed HTML (the img tags are a hodge podge of XHTML with a trailing />).
-
nickb about 12 yearsThe HTML parser should be pretty good with handling malformed HTML - Can you post a few examples of what's going wrong in your original post?
-
Surajit Jati about 12 yearsfigured it out - it was just a warning, but it was parsing correctly so I just threw a @ in front of the loadHTML line. Another question though, instead of creating a whole HTML document to save, can I save just partial HTML? The string I'm searching isn't a whole document, but just a portion enclosed in <p> tags.
-
hakre about 12 years@jpea: See
libxml_use_internal_errors
and yes,loadHTML
works with HTML chunks pretty well as well. Otherwise:sprintf("<body>%s</body>", $htmlChunk);
- but this is not necessary in your case I assume. See as well my answer which is similar but differently.