What is use of FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED flags for FILTER_VALIDATE_URL?

12,275

Solution 1

PHP 7.3.0

FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED are now deprecated in PHP 7.3.0 as they were used in FILTER_VALIDATE_URL anyways.


You are right. These two flags are turned on by default and there is no way to disable them, despite of what is said in the official documentation (see below). And as you wrote in the question, they are useless. I think it's a bug. There is a bugreport related to this.

From PHP docs:

Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:.

From RFC2396:

  URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
  absoluteURI   = scheme ":" ( hier_part | opaque_part )
  relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]

  hier_part     = ( net_path | abs_path ) [ "?" query ]
  opaque_part   = uric_no_slash *uric

  uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
                  "&" | "=" | "+" | "$" | ","

  net_path      = "//" authority [ abs_path ]
  abs_path      = "/"  path_segments
  rel_path      = rel_segment [ abs_path ]

  rel_segment   = 1*( unreserved | escaped |
                      ";" | "@" | "&" | "=" | "+" | "$" | "," )

  scheme        = alpha *( alpha | digit | "+" | "-" | "." )

  authority     = server | reg_name

  reg_name      = 1*( unreserved | escaped | "$" | "," |
                      ";" | ":" | "@" | "&" | "=" | "+" )

  server        = [ [ userinfo "@" ] hostport ]
  userinfo      = *( unreserved | escaped |
                     ";" | ":" | "&" | "=" | "+" | "$" | "," )

  hostport      = host [ ":" port ]
  host          = hostname | IPv4address
  hostname      = *( domainlabel "." ) toplabel [ "." ]
  domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
  toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
  IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
  port          = *digit

  path          = [ abs_path | opaque_part ]
  path_segments = segment *( "/" segment )
  segment       = *pchar *( ";" param )
  param         = *pchar
  pchar         = unreserved | escaped |
                  ":" | "@" | "&" | "=" | "+" | "$" | ","

  query         = *uric

  fragment      = *uric

  uric          = reserved | unreserved | escaped
  reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                  "$" | ","
  unreserved    = alphanum | mark
  mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                  "(" | ")"

  escaped       = "%" hex hex
  hex           = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                          "a" | "b" | "c" | "d" | "e" | "f"

  alphanum      = alpha | digit
  alpha         = lowalpha | upalpha

  lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
             "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
             "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
  upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
             "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
             "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
  digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
             "8" | "9"

As you can see, a valid URL may not have a scheme part or a hostname part, if it's relative. Probably, FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED should help to check their existence by design. But it's never possible.

Solution 2

The flags FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED have not had any effect since 5.2.1 in that they are always on, there is no way to disable them, and they're not actually used anywhere in the PHP source.

https://bugs.php.net/bug.php?id=75442

The docs have just been clarified regarding the use of the constants, but they're not likely to be removed until the next major PHP release [eg: PHP8] for the sake of preserving backwards-compatibility.

There are a narrow set of hard-coded schemes that don't require a hostname portion, [mailto:, news:, and file:] but if you want to implement validation of other URLs that do not have host or scheme parts you'll need to write something that calls parse_url() and operates on its return.

You can have a peek at the current URL validation source here: https://github.com/php/php-src/blob/master/ext/filter/logical_filters.c#L517-L574

Share:
12,275
Sergey Novikov
Author by

Sergey Novikov

Updated on June 24, 2022

Comments

  • Sergey Novikov
    Sergey Novikov almost 2 years

    We can use filter_vars() with FILTER_VALIDATE_URL and flags:

    FILTER_FLAG_SCHEME_REQUIRED
    FILTER_FLAG_HOST_REQUIRED
    FILTER_FLAG_PATH_REQUIRED
    FILTER_FLAG_QUERY_REQUIRED
    

    FILTER_VALIDATE_URL validates value as URL according to RFC 2396 and internally use parse_url() and require scheme (protocol) and host parts.

    If I want check path and query parts aswell I can use filter_vars like this:

    filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED);
    

    But what is use of FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED flags? It seems like no matter whether we specify these flags or not scheme and host parts will be checked anyway.

    For example, filtering valid relative URL like this:

    filter_var('test1/2.html', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED);
    

    return false.

  • Sergey Novikov
    Sergey Novikov almost 7 years
    Can you provide example how I can check for relative URL with FILTER_VALIDATE_URL, please? filter_var('test1/2.html', FILTER_VALIDATE_URL); return false but its valid relative URL as I understand...
  • German Lashevich
    German Lashevich almost 7 years
    You are right. These two flags turned on by default and there is no way to disable them. And as you wrote in the question, they are useless. I think it's a bug. There is a relative bugreport: bugs.php.net/bug.php?id=72301
  • Sergey Novikov
    Sergey Novikov almost 7 years
    Yup, I think its a bug, just was curious if I missed something. Update your answer and I'll be ready to accept it.
  • German Lashevich
    German Lashevich almost 7 years
    Done. I hope it's clear enough for people who may face this problem in the future.
  • gingerCodeNinja
    gingerCodeNinja almost 4 years
    I believe they were implicit from PHP 5.2.1. From php.net/manual/en/filter.filters.validate.php - 5.2.1 FILTER_VALIDATE_URL now implicitly uses FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED.