Convert UTF-8 String Classic ASP to SQL Database

12,106

Solution 1

Paul's answer isn't wrong but it is not the only part to consider:

You will need to go through each of these steps to make sure that you are getting consistent results;

IMPORTANT: These steps have to be performed on each and every page in your web application or you will have problems (emphasized by Paul's comment).

  1. Each page needs to be saved using UTF-8 encoding double check this as some IDEs will default to Windows-1252 (also often misnamed as "ANSI").

  2. Each page will need the following line added as the very first line in the page, to make this easier I put this along with some other values in an include file so I can include them in each page as I go.

    Include File - page_encoding.asp
    <%@Language="VBScript" CodePage = 65001 %>
    <% 
      Response.CharSet = "UTF-8"
      Response.CodePage = 65001
    %>
    

    Usage in the top of an ASP page (prefer to put in a config folder at the root of the web)

    <!-- #include virtual="/config/page_encoding.asp" -->
    

    Response.Charset = "UTF-8" is the equivalent of setting the ;charset in the HTTP content-type header. Response.CodePage = 65001 tell's ASP to process all dynamic strings as UTF-8.

  3. Include files in the page will also have to be saved using UTF-8 encoding (double check these also).

Follow these steps and your page will work, your problem at the moment is some pages are being interpreted as Windows-1252 while others are being treated as UTF-8 and you're ending up with a mis-match in encoding.

Solution 2

Normally - and that word has a veryyyyy long stretch - you do not need to convert on hand, even more it's discouraged. At the top off your asp page you write:

<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>

that tell's ASP to send and to receive (from a server point of view) UTF-8. Furthermore it instructs the interpreter to use 2 byte strings. So when writing to a database or reading from a database everything goes auto-magically, so if your database uses 1 byte char or 2 byte nchar conversions are taken care of. And actually that's about it. You can test if all goes well by testing with this set:

áäÇçéčëíďńóöçÖöÚü

This set contains some 'European' but also some 'Unicode' chars... those Unicode will always fail if you use codepage 1252, so it's a nice test set.

Share:
12,106
user1744228
Author by

user1744228

Updated on June 06, 2022

Comments

  • user1744228
    user1744228 almost 2 years

    So I was having an issue with converting French characters correctly. Basically, I have a form which sends data to an SQL Database. Then, on another page, data from this DB is retrieved and displayed to the user. But the data (strings) were being displayed with wierd corrupt characters because the input in the form on the other page was in French. I overcame this problem by using the following function which converters a string to the correct charset. HOWEVER, obviously the better solution is to convert it FIRST and then send it to the database. Now here's the code to convert a string retrieved from a DB to the appropriate charset:

    Function ConvertFromUTF8(sIn)
    
        Dim oIn: Set oIn = CreateObject("ADODB.Stream")
    
        oIn.Open
        oIn.CharSet = "WIndows-1252"
        oIn.WriteText sIn
        oIn.Position = 0
        oIn.CharSet = "UTF-8"
        ConvertFromUTF8 = oIn.ReadText
        oIn.Close
    
    End Function
    

    I got this function from here: Classic ASP - How to convert a UTF-8 string to UCS-2?

    Now my question is, what function do I use to convert strings beforehand and then send them to the database, so that when I retrieve them they will be good-to-go?

    Tried Paul's Method:

    So there's page 1, and page 2. Page 1 contains a form which, when submitted, sends the string to the DB which is then retrieved in page 2. I tried Paul's solution by removing the function ConvertFromUTF8 and leaving it to as it was before (it returned wierd mangolian characters). After that, I added the following line on top of Page 1 as well as Page 2.

    <%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>
    

    I also have the following on both of the pages:

    Response.CodePage = 65001 
    Response.CharSet = "UTF-8" 
    

    But it didn't work :(

    Edit: it works!, thank you so much everyone for your help! All I needed to do was add "CodePage = 65001" on top of Page 3 (which I didn't even talk about), where the writing to the DB part was happening.