What is a valid URL query string?
Solution 1
Per https://www.rfc-editor.org/rfc/rfc3986
In section 2.2 Reserved Characters, the following characters are listed:
reserved = gen-delims / sub-delims
gen-delims = “:” / “/” / “?” / “#” / “[” / “]” / “@”
sub-delims = “!” / “$” / “&” / “’” / “(” / “)” / “*” / “+” / “,” / “;” / “=”
The spec then says:
If data for a URI component would conflict with a reserved character’s purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.
Next, in section 2.3 Unreserved Characters, the following are listed:
unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”
Solution 2
Wikipedia has your answer: http://en.wikipedia.org/wiki/Query_string
"URL Encoding: Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document; the character = is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as URL encoding.
In particular, encoding the query string uses the following rules:
- Letters (A-Z and a-z), numbers (0-9) and the characters '.','-','~' and '_' are left as-is
- SPACE is encoded as '+' or %20[citation needed]
- All other characters are encoded as %FF hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)
The octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by"~" without changing its interpretation. The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 1738."
Regarding the format, query strings are name value pairs. The ? separates the query string from the URL. Each name value pair is separated by an ampersand (&) while the name (key) and value is separated by an equals sign (=). eg. http://domain.com?key=value&secondkey=secondvalue
Under Structure in the Wikipedia reference I provided:
- The question mark is used as a separator and is not part of the query string.
- The query string is composed of a series of field-value pairs
- Within each pair, the field name and value are separated by an equals sign, '='.
- The series of pairs is separated by the ampersand, '&' (or semicolon, ';' for URLs embedded in HTML and not generated by a ...; see below).
- W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands.
Solution 3
This link has the answer and formatted values you all need.
https://perishablepress.com/url-character-codes/
For your convenience, this is the list:
< %3C
> %3E
# %23
% %25
{ %7B
} %7D
| %7C
\ %5C
^ %5E
~ %7E
[ %5B
] %5D
` %60
; %3B
/ %2F
? %3F
: %3A
@ %40
= %3D
& %26
$ %24
+ %2B
" %22
space %20
Aran Mulholland
Full stack, Javascript, HTML, .NET, .NET Core, ASP.NET MVC, Project Coding Infrastructure, iOS, neo4j development, analogue synthesis, toilet photographer.
Updated on July 22, 2020Comments
-
Aran Mulholland almost 4 years
What characters are allowed in an URL query string?
Do query strings have to follow a particular format?
-
Lightness Races in Orbit over 10 yearsCan you provide a citation for the final paragraph?
-
Clarice Bouwer over 10 yearsI added that paragraph based on personal experience but I've updated and added more information that I could find to back it up. In doing so, I noticed that key-values are not only separated by an ampersand but can be by a semi-colon although I've never come across it before. Also, the question mark is not part of the QS but is rather a separator.
-
laune almost 10 yearsIn the text of the answer: "each name value pair is prefixed with an ampersand" the wording ("prefixed") is misleading. Farther down, there is the correct "...pairs is separated...".
-
MrWhite about 9 yearsRFC 3986 - Section 3.4 specifically describes the query string and notably includes the sub-delims and a handful of others. In summary:
A
-Z
,a
-z
,0
-9
,-
,.
,_
,~
,!
,$
,&
,'
,(
,)
,*
,+
,,
,;
,=
,:
,@
,/
,?
-
kleopatra almost 9 yearsNote that link-only answers are discouraged, SO answers should be the end-point of a search for a solution (vs. yet another stopover of references, which tend to get stale over time). Please consider adding a stand-alone synopsis here, keeping the link as a reference.
-
Abhijit Sarkar over 2 years@MrWhite It's been a while since your comment, but what does your summary mean in plain english? Do these characters need to be encoded or not? I've looked at section 3.4 but didn't see a list.