Classic ASP text substitution and UTF-8 encoding

54,371

Solution 1

UTF-8 does not use BOMs; it is an annoying misfeature in some Microsoft software that puts them there. You need to find what step of your release process is putting a UTF-8-encoded BOM in your files and fix it — you should stop that even if you are using UTF-8, which really these days is best.

But I doubt it's IIS causing the display problem. More likely the browser is guessing the charset of the final displayed page, and when it sees bytes that look like they're UTF-8 encoded it guesses the whole page is UTF-8. You should be able to stop it doing that by stating a definitive charset by using an HTTP header:

Content-Type: text/html;charset=iso-8859-1

and/or a meta element in the HTML

<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" />

Now (assuming ISO-8859-1 is actually the character set your data are in) it should display OK. However if your file really does have a UTF-8-encoded BOM at the start, you'll now see that as ‘’ in your page, which is what those bytes look like in ISO-8859-1. So you still need to get rid of that misBOM.

Solution 2

I was searching on the same exact issue yesterday and came across:

http://blog.inspired.no/utf-8-with-asp-71/

Important part from that page, in case it goes away...

ASP CODE:

Response.ContentType = "text/html"
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8"

and the following HTML META tag:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />

We were using the meta tag and asp CharSet property, yet the page still didn't render correctly. After adding the other three lines to the asp file everything just worked.

Hope this helps!

Share:
54,371
Phrygian Moon
Author by

Phrygian Moon

Just your average joe trying to make code better. I work for the best company in the world Featurist. We solve challenging problems for some very large companies. We also make Shipping Report the best project reporting tool available.

Updated on April 28, 2020

Comments

  • Phrygian Moon
    Phrygian Moon about 4 years

    We have a website that uses Classic ASP.

    Part of our release process substitutes values in a file and we found a bug in it where it will write the file out as UTF-8.

    This then causes our application to start spitting out garbage. Apostrophes get returned as some encoded characters.

    If we then go an remove the BOM that says this file is UTF-8 then the text that was previously rendered as garbage is now displayed correctly.

    Is there something that IIS does differently when it encounters UTF-8 a file?

  • Phrygian Moon
    Phrygian Moon over 14 years
    Right this makes sense. It was actually a bug in some code that was written specifically to handle this kind of issue. Thanks.
  • AnthonyWJones
    AnthonyWJones over 14 years
    I must admit this answer confuses me. "UTF-8 does not use BOMs" could you eloborate? In what way is this a "misfeature" ? I've never come across a problem using UTF-8 files that include this zero width space character, what problems have you encountered?
  • Amit Patil
    Amit Patil over 14 years
    Any bytes-based text tool (such as shells, config file loaders etc.) will immediately fall over when presented with “” at the start of a file; it is the explicit aim of UTF-8 to be compatible with tools that know nothing about Unicode, but UTF-8+BOM breaks this. Even some Unicode-aware tools will trip over it because a BOM is only expected to be present and automatically removed by the Unicode decoding process for UTF-16. UTF-8+BOM breaks applications and there is no justification for using it in the Unicode specs; and there isn't even any benefit to it as UTF-8 has no byte order issues.
  • Áxel Costas Pena
    Áxel Costas Pena over 10 years
    Also confused about "UTF-8 does not use BOMs", there is no clarification needed, it's simply a wrongly-built affirmation.
  • user692942
    user692942 over 10 years
    You don't need both the meta tag and Response.CharSet = "UTF-8" as they both serve the same purpose, personally I prefer to use Response.CharSet = "UTF-8" rather then explicitly setting it as a meta tag in html. Also Response.AddHeader "Content-Type", "text/html;charset=UTF-8" is just an explicit form of writing Response.ContentType = "text/html" and Response.CharSet = "UTF-8" what you are suggesting is pointless, stick to using Response.ContentType and Response.CharSet.
  • MistyDawn
    MistyDawn over 4 years
    Implicitly declaring your charSet and contentType in a meta tag meets W3C standards of acceptable practices. Regardless of how you decide to declare the headers in your asp, redundant or not, you should still include a meta tag that declares the content type and charset. If you run a page through the W3C validation checker at validator.w3.org/i18n-checker it will fail without the meta tag for type declaration. It's better, in this particular case, to have too many declarations than too few.