Understanding Pattern in preg_match_all() Function Call

10,093

Solution 1

You are looking for this,

  1. PHP PCRE Pattern Syntax
  2. PCRE Standard syntax

Note that first one is a subset of second one.

Solution 2

Those aren't "PHP patterns", those are Regular Expressions. Instead of trying to explain what has been explained before a thousand times in this answer, I'll point you to http://regular-expressions.info for information and tutorials.

Solution 3

Also have a look at YAPE, which for example gives this nice textual explanation for your first regex:

(?x-ims:\(?  (\d{3})?  \)?  (?(1)  [\-\s] ) \d{3}-\d{4})

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  \(?                      '(' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \1 (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  )?                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
----------------------------------------------------------------------
  \)?                      ')' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?(1)                    if back-reference \1 matched, then:
----------------------------------------------------------------------
    [\-\s]                   any character of: '\-', whitespace (\n,
                             \r, \t, \f, and " ")
----------------------------------------------------------------------
   |                        else:
----------------------------------------------------------------------
                             succeed
----------------------------------------------------------------------
  )                        end of conditional on \1
----------------------------------------------------------------------
  \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Solution 4

The pattern you write about is a mini-language in it's own called Regular Expression. It's specialized on finding patterns in strings, do replacements etc. for everything that follows some sort of pattern.

More specifically it's a Perl Compatible Regular Expression (PCRE).

The handbook for that language is not available on the PHP manual website, you find it here: PCRE Manpage.

A well made step-by-step introduction is on the Regular Expressions Info Website.

Share:
10,093
Kevin_TA
Author by

Kevin_TA

Updated on June 16, 2022

Comments

  • Kevin_TA
    Kevin_TA almost 2 years

    I am trying to understand how preg_match_all() works and when looking at the documentation on the php.net site, I see some examples but am baffled by the strings sent as the pattern parameter. Is there a really thorough, clear explanation out there? For example, I don't understand what the pattern in this example means:

    preg_match_all("/\(?  (\d{3})?  \)?  (?(1)  [\-\s] ) \d{3}-\d{4}/x",
                "Call 555-1212 or 1-800-555-1212", $phones);
    

    or this:

    $html = "<b>bold text</b><a href=howdy.html>click me</a>";
    preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);
    

    I've taken an introductory class on PHP, but never saw anything like this. Some clarification would be appreciated.

    Thanks!

  • Kevin_TA
    Kevin_TA over 12 years
    Where exactly did you get that output from? I checked out the link you provided but it was just a Google search and I didn't really find something that may have produced this output.
  • mario
    mario over 12 years
    Yes, it's that first link, the Perl module. I made myself a tiny shell script for that. It's just perl -e " use YAPE::Regex::Explain; my \$re = qr{$1}$2; print YAPE::Regex::Explain->new(\$re)->explain(); " -- But you can also just keep rewriting that small example script as seen on its CPAN page.
  • Alan Moore
    Alan Moore over 12 years
    That isn't right. The OP's regex uses the /x modifier, so the first node should be (?x-ims: and those pure whitespace nodes shouldn't be listed. But that list is incomplete anyway. According to this bug report, the module hasn't been updated since Perl 5.6, and PCRE always supported a slightly different set of modifiers to begin with.
  • mario
    mario over 12 years
    @AlanMoore: True, updated with actually specifying x. It's only useful for illustrative purposes anyway. It seems to work with many PCRE patterns still, but obviously it's not the prettiest tool.