What is a valid URL query string?

36,236

Solution 1

Per https://www.rfc-editor.org/rfc/rfc3986

In section 2.2 Reserved Characters, the following characters are listed:

reserved = gen-delims / sub-delims

gen-delims = “:” / “/” / “?” / “#” / “[” / “]” / “@”

sub-delims = “!” / “$” / “&” / “’” / “(” / “)” / “*” / “+” / “,” / “;” / “=”

The spec then says:

If data for a URI component would conflict with a reserved character’s purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

Next, in section 2.3 Unreserved Characters, the following are listed:

unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”

Solution 2

Wikipedia has your answer: http://en.wikipedia.org/wiki/Query_string

"URL Encoding: Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document; the character = is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as URL encoding.

In particular, encoding the query string uses the following rules:

  • Letters (A-Z and a-z), numbers (0-9) and the characters '.','-','~' and '_' are left as-is
  • SPACE is encoded as '+' or %20[citation needed]
  • All other characters are encoded as %FF hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)

The octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by"~" without changing its interpretation. The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 1738."

Regarding the format, query strings are name value pairs. The ? separates the query string from the URL. Each name value pair is separated by an ampersand (&) while the name (key) and value is separated by an equals sign (=). eg. http://domain.com?key=value&secondkey=secondvalue

Under Structure in the Wikipedia reference I provided:

  • The question mark is used as a separator and is not part of the query string.
  • The query string is composed of a series of field-value pairs
  • Within each pair, the field name and value are separated by an equals sign, '='.
  • The series of pairs is separated by the ampersand, '&' (or semicolon, ';' for URLs embedded in HTML and not generated by a ...; see below).
  • W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands.

Solution 3

This link has the answer and formatted values you all need.

https://perishablepress.com/url-character-codes/

For your convenience, this is the list:

<     %3C
>     %3E
#     %23
%     %25
{     %7B
}     %7D
|     %7C
\     %5C
^     %5E
~     %7E
[     %5B
]     %5D
`     %60
;     %3B
/     %2F
?     %3F
:     %3A
@     %40
=     %3D
&     %26
$     %24
+     %2B
"     %22
space     %20
Share:
36,236
Aran Mulholland
Author by

Aran Mulholland

Full stack, Javascript, HTML, .NET, .NET Core, ASP.NET MVC, Project Coding Infrastructure, iOS, neo4j development, analogue synthesis, toilet photographer.

Updated on July 22, 2020

Comments

  • Aran Mulholland
    Aran Mulholland almost 4 years

    What characters are allowed in an URL query string?

    Do query strings have to follow a particular format?

  • Lightness Races in Orbit
    Lightness Races in Orbit over 10 years
    Can you provide a citation for the final paragraph?
  • Clarice Bouwer
    Clarice Bouwer over 10 years
    I added that paragraph based on personal experience but I've updated and added more information that I could find to back it up. In doing so, I noticed that key-values are not only separated by an ampersand but can be by a semi-colon although I've never come across it before. Also, the question mark is not part of the QS but is rather a separator.
  • laune
    laune almost 10 years
    In the text of the answer: "each name value pair is prefixed with an ampersand" the wording ("prefixed") is misleading. Farther down, there is the correct "...pairs is separated...".
  • MrWhite
    MrWhite about 9 years
    RFC 3986 - Section 3.4 specifically describes the query string and notably includes the sub-delims and a handful of others. In summary: A-Z, a-z, 0-9, -, ., _, ~, !, $, &, ', (, ), *, +, ,, ;, =, :, @, /, ?
  • kleopatra
    kleopatra almost 9 years
    Note that link-only answers are discouraged, SO answers should be the end-point of a search for a solution (vs. yet another stopover of references, which tend to get stale over time). Please consider adding a stand-alone synopsis here, keeping the link as a reference.
  • Abhijit Sarkar
    Abhijit Sarkar over 2 years
    @MrWhite It's been a while since your comment, but what does your summary mean in plain english? Do these characters need to be encoded or not? I've looked at section 3.4 but didn't see a list.