What is use of FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED flags for FILTER_VALIDATE_URL?
Solution 1
PHP 7.3.0
FILTER_FLAG_SCHEME_REQUIRED
and FILTER_FLAG_HOST_REQUIRED
are now deprecated in PHP 7.3.0 as they were used in FILTER_VALIDATE_URL
anyways.
You are right. These two flags are turned on by default and there is no way to disable them, despite of what is said in the official documentation (see below). And as you wrote in the question, they are useless. I think it's a bug. There is a bugreport related to this.
Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:.
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
absoluteURI = scheme ":" ( hier_part | opaque_part )
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
hier_part = ( net_path | abs_path ) [ "?" query ]
opaque_part = uric_no_slash *uric
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
"&" | "=" | "+" | "$" | ","
net_path = "//" authority [ abs_path ]
abs_path = "/" path_segments
rel_path = rel_segment [ abs_path ]
rel_segment = 1*( unreserved | escaped |
";" | "@" | "&" | "=" | "+" | "$" | "," )
scheme = alpha *( alpha | digit | "+" | "-" | "." )
authority = server | reg_name
reg_name = 1*( unreserved | escaped | "$" | "," |
";" | ":" | "@" | "&" | "=" | "+" )
server = [ [ userinfo "@" ] hostport ]
userinfo = *( unreserved | escaped |
";" | ":" | "&" | "=" | "+" | "$" | "," )
hostport = host [ ":" port ]
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
port = *digit
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped |
":" | "@" | "&" | "=" | "+" | "$" | ","
query = *uric
fragment = *uric
uric = reserved | unreserved | escaped
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
"(" | ")"
escaped = "%" hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
alphanum = alpha | digit
alpha = lowalpha | upalpha
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
As you can see, a valid URL may not have a scheme part or a hostname part, if it's relative. Probably, FILTER_FLAG_SCHEME_REQUIRED
and FILTER_FLAG_HOST_REQUIRED
should help to check their existence by design. But it's never possible.
Solution 2
The flags FILTER_FLAG_SCHEME_REQUIRED
and FILTER_FLAG_HOST_REQUIRED
have not had any effect since 5.2.1 in that they are always on, there is no way to disable them, and they're not actually used anywhere in the PHP source.
https://bugs.php.net/bug.php?id=75442
The docs have just been clarified regarding the use of the constants, but they're not likely to be removed until the next major PHP release [eg: PHP8] for the sake of preserving backwards-compatibility.
There are a narrow set of hard-coded schemes that don't require a hostname portion, [mailto:
, news:
, and file:
] but if you want to implement validation of other URLs that do not have host or scheme parts you'll need to write something that calls parse_url()
and operates on its return.
You can have a peek at the current URL validation source here: https://github.com/php/php-src/blob/master/ext/filter/logical_filters.c#L517-L574
Sergey Novikov
Updated on June 24, 2022Comments
-
Sergey Novikov almost 2 years
We can use
filter_vars()
withFILTER_VALIDATE_URL
and flags:FILTER_FLAG_SCHEME_REQUIRED FILTER_FLAG_HOST_REQUIRED FILTER_FLAG_PATH_REQUIRED FILTER_FLAG_QUERY_REQUIRED
FILTER_VALIDATE_URL
validates value as URL according to RFC 2396 and internally useparse_url()
and require scheme (protocol) and host parts.If I want check path and query parts aswell I can use filter_vars like this:
filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED | FILTER_FLAG_QUERY_REQUIRED);
But what is use of
FILTER_FLAG_SCHEME_REQUIRED
andFILTER_FLAG_HOST_REQUIRED
flags? It seems like no matter whether we specify these flags or not scheme and host parts will be checked anyway.For example, filtering valid relative URL like this:
filter_var('test1/2.html', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED);
return false.
-
Sergey Novikov almost 7 yearsCan you provide example how I can check for relative URL with
FILTER_VALIDATE_URL
, please?filter_var('test1/2.html', FILTER_VALIDATE_URL);
return false but its valid relative URL as I understand... -
German Lashevich almost 7 yearsYou are right. These two flags turned on by default and there is no way to disable them. And as you wrote in the question, they are useless. I think it's a bug. There is a relative bugreport: bugs.php.net/bug.php?id=72301
-
Sergey Novikov almost 7 yearsYup, I think its a bug, just was curious if I missed something. Update your answer and I'll be ready to accept it.
-
German Lashevich almost 7 yearsDone. I hope it's clear enough for people who may face this problem in the future.
-
gingerCodeNinja almost 4 yearsI believe they were implicit from PHP 5.2.1. From php.net/manual/en/filter.filters.validate.php - 5.2.1 FILTER_VALIDATE_URL now implicitly uses FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED.