UTF-8 characters mangled in HTTP Basic Auth username

ruby-on-rails http utf-8 http-headers

Solution 1

I want to allow any valid UTF-8 characters in usernames and passwords.

Abandon all hope. Basic Authentication and Unicode don't mix.

There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:

Opera uses UTF-8;
IE uses the system's default codepage (which you have no way of knowing, other than it's never UTF-8), and silently mangles characters that don't fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8;
Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.

*: some people interpret the standard to say that either:

it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
it should be encoded using RFC2047 rules, somehow.

But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn't work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don't.

In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It's just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.

Solution 2

It's a known shortcoming that Basic authentication does not provide support for non-ISO-8859-1 characters.

Some UAs are known to use UTF-8 instead (Opera comes to mind), but there's no interoperability for that either.

As far as I can tell, there's no way to fix this, except by defining a new authentication scheme that handles all of Unicode. And getting it deployed.

Solution 3

HTTP Digest authentication is no solution for this problem, either. It suffers from the same problem of the client being unable to tell the server what character set it's using and the server being unable to correctly assume what the client used.

Author by

Bogdan

Updated on June 19, 2022

Comments

Bogdan almost 2 years
I want to request data from SAP based on a date in time . I use a SOAP Message to do that. In the XML of the SOAP message for the date variable I have this piece of code that was not developed by me .
```
**<xsd:simpleType name="date10">
<xsd:restriction base="xsd:string">
<xsd:maxLength value="10"/>
<xsd:pattern value="\d\d\d\d-\d\d-\d\d"/>
</xsd:restriction>
</xsd:simpleType>**
```
I am not sure that the way it's written is ok. It should be a date , not a string.

Please tell me if the way it's written may be correct . From my point of view it should be xsd:date with the pattern value "\y\y\y\y-\m\m-\d\d".

Thank you.
Mohsen Rashidi about 15 years

There's a colon there, once you base64 decode it. It ends up being 32 16 bit characters (at least Emacs thinks they're characters), colon, then the same 16 bit characters (I used the same string for password). I tried it with IE and got the same thing, so it's not just a Firefox thing.
Hank Gay about 15 years

I was just using some OS X dashboard widget to do the conversion, but it definitely wasn't finding a colon after base64 decoding. It must have been trying to use MacRoman or something.
Julian Reschke about 15 years

I happen to disagree that Opera does it somehow right. You can't change the encoding unilaterally.
Amit Patil about 15 years

Not so much ‘right’ as “what the OP wanted it to do”. Although since none of the alternatives are ‘right’, UTF-8 is at least as good as any other possible option.
Mohsen Rashidi about 15 years

At least UTF-8 won't mangle some characters :) Thanks very much for this answer (it expands on Julian's - they both answer the question nicely). I did a lot of Googling and couldn't find a solid discussion of this. Time to go change my specs.
chirlu over 8 years

There is A New Hope: The new RFC 7617 allows servers to request UTF-8 encoding, resolving the ambiguity. A compliant client will then respond accordingly. – Of course, this doesn’t mean all client software will immediately implement RFC 7617; it’s likely to take years before this issue can be called “mostly resolved”
Amit Patil over 8 years

@chirlu: Indeed! We have Julian to thank for that. Crossing fingers for implementation now...
chirlu over 8 years

Oh right, I hadn’t made the connection – thank you, @Julian!