PHP preg_match_all regex to extract only number in string

12,079

Solution 1

I think this is the best approach:

  1. Use an HTML parser to extract the image tags
  2. Use a regular expression (or perhaps string manipulation) to extract the ID
  3. Query for the data
  4. Use the HTML parser to insert the returned data

Here is an example. There are improvements I can think of, such as using string manipulation instead of a regex.

$html = '<img src="http://domain.com/images/59.jpg" class="something" />
<img src="http://domain.com/images/549.jpg" class="something" />
<img src="http://domain.com/images/1249.jpg" class="something" />
<img src="http://domain.com/images/6.jpg" class="something" />';
$doc = new DOMDocument;
$doc->loadHtml( $html);

foreach( $doc->getElementsByTagName('img') as $img)
{
    $src = $img->getAttribute('src');
    preg_match( '#/images/([0-9]+)\.#i', $src, $matches);
    $id = $matches[1];
    echo 'Fetching info for image ID ' . $id . "\n";

    // Query stuff here
    $result = 'Got this from the DB';

    $img->setAttribute( 'title', $result);
    $img->setAttribute( 'alt', $result);
}

$newHTML = $doc->saveHtml();

Solution 2

Using regular expressions, you can get the number really easily. The third argument for preg_match_all is a by-reference array that will be populated with the matches that were found.

preg_match_all('/<img src="http:\/\/domain.com\/images\/(\d+)\.[a-zA-Z]+"/', $html, $matches);
print_r($matches);

This would contain all of the stuff that it found.

Solution 3

use preg_match_all:

preg_match_all('#<img.*?/(\d+)\.#', $str, $m);
print_r($m);

output:

Array
(
    [0] => Array
        (
            [0] => <img src="http://domain.com/images/59.
            [1] => <img src="http://domain.com/images/549.
            [2] => <img src="http://domain.com/images/1249.
            [3] => <img src="http://domain.com/images/6.
        )

    [1] => Array
        (
            [0] => 59
            [1] => 549
            [2] => 1249
            [3] => 6
        )

)

Solution 4

Consider using preg_replace_callback.

Use this regex: (images/([0-9]+)[^"]+")

Then, as the callback argument, use an anonymous function. Result:

$output = preg_replace_callback(
    "(images/([0-9]+)[^\"]+\")",
    function($m) {
        // $m[1] is the number.
        $t = getTitleFromDatabase($m[1]); // do whatever you have to do to get the title
        return $m[0]." title=\"".$t."\"";
    },
    $input
);
Share:
12,079
Surajit Jati
Author by

Surajit Jati

Full-stack web developer

Updated on June 04, 2022

Comments

  • Surajit Jati
    Surajit Jati almost 2 years

    I can't seem to figure out the proper regular expression for extracting just specific numbers from a string. I have an HTML string that has various img tags in it. There are a bunch of img tags in the HTML that I want to extract a portion of the value from. They follow this format:

    <img src="http://domain.com/images/59.jpg" class="something" />
    <img src="http://domain.com/images/549.jpg" class="something" />
    <img src="http://domain.com/images/1249.jpg" class="something" />
    <img src="http://domain.com/images/6.jpg" class="something" />
    

    So, varying lengths of numbers before what 'usually' is a .jpg (it may be a .gif, .png, or something else too). I want to only extract the number from that string.

    The 2nd part of this is that I want to use that number to look up an entry in a database and grab the alt/title tag for that specific id of image. Lastly, I want to add that returned database value into the string and throw it back into the HTML string.

    Any thoughts on how to proceed with it would be great...

    Thus far, I've tried:

    $pattern = '/img src="http://domain.com/images/[0-9]+\/.jpg';
    preg_match_all($pattern, $body, $matches);
    var_dump($matches);
    
    • jordanm
      jordanm about 12 years
      You just need to use a capture group. What have you tried?
    • Surajit Jati
      Surajit Jati about 12 years
      post edited with what I've tried thus far
  • Surajit Jati
    Surajit Jati about 12 years
    that captures every number in the string, not just in <img> tags
  • Surajit Jati
    Surajit Jati about 12 years
    I love this approach, but how should I deal with the warnings of malformed HTML (the img tags are a hodge podge of XHTML with a trailing />).
  • nickb
    nickb about 12 years
    The HTML parser should be pretty good with handling malformed HTML - Can you post a few examples of what's going wrong in your original post?
  • Surajit Jati
    Surajit Jati about 12 years
    figured it out - it was just a warning, but it was parsing correctly so I just threw a @ in front of the loadHTML line. Another question though, instead of creating a whole HTML document to save, can I save just partial HTML? The string I'm searching isn't a whole document, but just a portion enclosed in <p> tags.
  • hakre
    hakre about 12 years
    @jpea: See libxml_use_internal_errors and yes, loadHTML works with HTML chunks pretty well as well. Otherwise: sprintf("<body>%s</body>", $htmlChunk); - but this is not necessary in your case I assume. See as well my answer which is similar but differently.