Chrome form POST shows "(unable to decode value)" and database stores it as a question mark

17,299

Solution 1

U+03A9 Ω Greek capital letter omega is not part of Windows code page 1252.

U+00B5 µ Micro sign (which is not the exact same character as Greek mu) is part of 1252 (byte 181).

The Alt+keypad shortcut numbers don't align with code page 1252, or the current ANSI code page in general, so being able to type a character from that shortcut doesn't imply membership of those code pages. Instead they are from DOS code page 437.

And when I submit the form it posts and stores it as Ω I'm assuming this is the browser saying "hey, this isn't in the specified charset but I do know of an html equivalent, so I'll post that instead"

Yes, this is a long-standing weird unrecoverable mangling that HTML5 finally standardised, for when a character is not encodable in the encoding the page has requested.

Instead I see "(unable to decode value)" when viewing the POST in the Chrome DevTool window. And it ends up being stored in the database as a question mark.

The browser will be sending that character as code page 1252 byte 181. The devtools and whatever your application is aren't expecting to be dealing with code page 1252 bytes... probably they are expecting UTF-8. Because byte 181 on its own is not a valid UTF-8 sequence they can't keep it.

Solution 2

Using encodeURIComponent to wrap the value fixed the problem.

Broken:

`?value=${myValue}`

Working:

`?value=${encodeURIComponent(myValue)}`
Share:
17,299
gfrobenius
Author by

gfrobenius

http://www.linkedin.com/in/gfrobenius

Updated on July 18, 2022

Comments

  • gfrobenius
    gfrobenius almost 2 years

    I have a test site and test DB both set to windows-1252. When I type Alt+234 into Chrome it puts this symbol in the field: Ω. And when I submit the form it posts and stores it as Ω I'm assuming this is the browser saying "hey, this isn't in the specified charset but I do know of an html equivalent, so I'll post that instead". Fine. The symbol appears properly after saving, I can save, save, save, and it always appears fine. But if I try the same thing with Alt+230 the browser does not submit it's html entity value of µ. Instead I see "(unable to decode value)" when viewing the POST in the Chrome DevTool window. And it ends up being stored in the database as a question mark.

    Why does it treat Alt+234 (Ω) differently than Alt+230 (µ)?

    I know I should switch to UTF8 but I still would like to know why it is functioning this way. Thanks!

  • gfrobenius
    gfrobenius over 9 years
    Thanks. The water is a little clearer now. I overlooked the fact that µ is part of windows-1252. Because it is not being stored properly I assumed it wasn't. But I have discovered that some form posts in my app submit it and store it in the DB correctly as µ. But some posts change it to an uppercase M right before the update statement, then after the update it appears as a ? in the DB. So it must be some of the ColdFusion code doing this. We have a customtag for doing all updates, it's based off <cfupdate>. Only code that calls it has this behavior. I will look there.
  • gfrobenius
    gfrobenius over 9 years
    Found it. Turns out if you do a ColdFusion #UCase()# on a value with that symbol it screws it up. So instead of doing a #UCase()# on the entire value I will do 26 replace statements on only a-z.