How to solve "unable to switch the encoding" error when inserting XML into SQL Server
Solution 1
Although a .net string is always UTF-16
you need to serialize the object using UTF-16
encoding.
That sould be something like this:
public static string ToString(object source, Type type, Encoding encoding)
{
// The string to hold the object content
String content;
// Create a memoryStream into which the data can be written and readed
using (var stream = new MemoryStream())
{
// Create the xml serializer, the serializer needs to know the type
// of the object that will be serialized
var xmlSerializer = new XmlSerializer(type);
// Create a XmlTextWriter to write the xml object source, we are going
// to define the encoding in the constructor
using (var writer = new XmlTextWriter(stream, encoding))
{
// Save the state of the object into the stream
xmlSerializer.Serialize(writer, source);
// Flush the stream
writer.Flush();
// Read the stream into a string
using (var reader = new StreamReader(stream, encoding))
{
// Set the stream position to the begin
stream.Position = 0;
// Read the stream into a string
content = reader.ReadToEnd();
}
}
}
// Return the xml string with the object content
return content;
}
By setting the encoding to Encoding.Unicode not only the string will be UTF-16
but you should also get the xml string as UTF-16
.
<?xml version="1.0" encoding="utf-16"?>
Solution 2
This question is a near-duplicate of 2 others, and surprisingly - while this one is the most recent - I believe it is missing the best answer.
The duplicates, and what I believe to be their best answers, are:
- Using StringWriter for XML Serialization (2009-10-14)
- https://stackoverflow.com/a/1566154/751158
- Trying to store XML content into SQL Server 2005 fails (encoding problem) (2008-12-21)
- https://stackoverflow.com/a/1091209/751158
In the end, it doesn't matter what encoding is declared or used, as long as the XmlReader
can parse it locally within the application server.
As was confirmed in Most efficient way to read XML in ADO.net from XML type column in SQL server?, SQL Server stores XML in an efficient binary format. By using the SqlXml
class, ADO.net can communicate with SQL Server in this binary format, and not require the database server to do any serialization or de-serialization of XML. This should also be more efficient for transport across the network.
By using SqlXml
, XML will be sent pre-parsed to the database, and then the DB doesn't need to know anything about character encodings - UTF-16 or otherwise. In particular, note that the XML declarations aren't even persisted with the data in the database, regardless of which method is used to insert it.
Please refer to the above-linked answers for methods that look very similar to this, but this example is mine:
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using System.Xml;
static class XmlDemo {
static void Main(string[] args) {
using(SqlConnection conn = new SqlConnection()) {
conn.ConnectionString = "...";
conn.Open();
using(SqlCommand cmd = new SqlCommand("Insert Into TestData(Xml) Values (@Xml)", conn)) {
cmd.Parameters.Add(new SqlParameter("@Xml", SqlDbType.Xml) {
// Works.
// Value = "<Test/>"
// Works. XML Declaration is not persisted!
// Value = "<?xml version=\"1.0\"?><Test/>"
// Works. XML Declaration is not persisted!
// Value = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><Test/>"
// Error ("unable to switch the encoding" SqlException).
// Value = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Test/>"
// Works. XML Declaration is not persisted!
Value = new SqlXml(XmlReader.Create(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Test/>")))
});
cmd.ExecuteNonQuery();
}
}
}
}
Note that I would not consider the last (non-commented) example to be "production-ready", but left it as-is to be concise and readable. If done properly, both the StringReader
and the created XmlReader
should be initialized within using
statements to ensure that their Close()
methods are called when complete.
From what I've seen, the XML declarations are never persisted when using an XML column. Even without using .NET and just using this direct SQL insert statement, for example, the XML declaration is not saved into the database with the XML:
Insert Into TestData(Xml) Values ('<?xml version="1.0" encoding="UTF-8"?><Test/>');
Now in terms of the OP's question, the object to be serialized still needs to be converted into an XML structure from the MyMessage
object, and XmlSerializer
is still needed for this. However, at worst, instead of serializing to a String, the message could instead be serialized to an XmlDocument
- which can then be passed to SqlXml
through a new XmlNodeReader
- avoiding a de-serialization/serialization trip to a string. (See http://blogs.msdn.com/b/jongallant/archive/2007/01/30/how-to-convert-xmldocument-to-xmlreader-for-sqlxml-data-type.aspx for details and an example.)
Everything here was developed against and tested with .NET 4.0 and SQL Server 2008 R2.
Please don't make waste by running XML through extra conversions (de-deserializations and serializations - to DOM, strings, or otherwise), as shown in other answers here and elsewhere.
Solution 3
It took me forever to re-solve this problem.
I was doing an INSERT
statement into SQL Server as something like:
UPDATE Customers
SET data = '<?xml version="1.0" encoding="utf-16"?><MyMessage>Teno</MyMessage>';
and this gives the error:
Msg 9402, Level 16, State 1, Line 2
XML parsing: line 1, character 39, unable to switch the encoding
And the really, very simple fix is to:
UPDATE Customers
SET data = N'<?xml version="1.0" encoding="utf-16"?><MyMessage>Teno</MyMessage>';
The difference is prefixing the Unicode string with N
:
N'<?xml version="1.0" encoding="utf-16"?>Teno</MyMessage>'
In the former case, an unprefixed string is assumed to be varchar (e.g. Windows-1252 code-page). When it encounters the encoding="utf-16"
inside the string, there is a conflict (and rightly so, since the string isn't utf-16).
The fix is to pass the string to SQL server as an nvarchar (i.e. UTF-16):
N'<?xml version="1.0" encoding="utf-16"?>'
That way the string is UTF-16, which matches the utf-16 encoding that the XML says it is. The carpet matches the curtains, so to speak.
Solution 4
Isn't the easiest solution to tell the serializer not to ouput the XML declaration? .NET and SQL should sort the rest out between them.
XmlSerializer serializer = new XmlSerializer(typeof(MyMessage));
StringWriter str = new StringWriter();
using (XmlWriter writer = XmlWriter.Create(str, new XmlWriterSettings { OmitXmlDeclaration = true }))
{
serializer.Serialize(writer, message);
}
string messageToLog = str.ToString();
Solution 5
@ziesemer's answer (above) is the only fully correct answer to this question and the linked duplicates of this question. However, it could still use a little more explanation and some clarification. Consider this as an extension of @ziesemer's answer.
Even if they produce the desired result, most answers to this question (including the duplicate question) are convoluted and go through many unnecessary steps. The main issue here is the overall lack of understanding regarding how the XML
datatype actually works in SQL Server (not surprising given that it isn't well documented). The XML
type:
- Is a highly optimized (for storage) type that converts the incoming XML into a binary format (which is documented somewhere in the
msdn
site). The optimizations include:- Converting numbers and dates from string (as they are in the XML) into binary representations IF the element or attribute is tagged with the type info (this might require specifying an XML Schema Collection). Meaning, the number "1234567" is stored as a 4-byte "int" instead of a 14-byte UTF-16 string of 7 digits.
- Element and Attribute names are stored in a dictionary and given a numeric ID. That numeric ID is used in the XML tree structure. Meaning, "
<ElementName>...</ElementName>
" takes up 27 character (i.e. 54 bytes) in string form, but only 11 characters (i.e. 22 bytes) when stored in theXML
type. And that is for a single instance of it. Multiple instances take up additional multiples of the 54 bytes. But in the XML type, each instance only takes up the space of that numeric ID, most likely a 4-byte int.
- Stores strings as UTF-16 Little Endian, always. This is most likely why the XML declaration is not stored: it is entirely unnecessary as it is always the same since the "Encoding" attribute cannot ever change.
- No XML declaration assumes the encoding to be UTF-16, not UTF-8.
-
Can have 8-bit / non-UTF-16 data passed in. In this case, you need to make sure that the string is not an
NVARCHAR
string (i.e. not prefixed with an upper-case "N" for literals, not declared asNVARCHAR
when dealing with T-SQL variables, and not declared asSqlDbType.NVarChar
in .NET). AND, you need to make sure that you do have theXML
declaration, and that it specifies the correct encoding.PRINT 'VARCHAR / UTF-8:'; DECLARE @XML_VC_8 XML; SET @XML_VC_8 = '<?xml version="1.0" encoding="utf-8"?><test/>'; PRINT 'Success!' -- Success! GO PRINT ''; PRINT 'NVARCHAR / UTF-8:'; DECLARE @XML_NVC_8 XML; SET @XML_NVC_8 = N'<?xml version="1.0" encoding="utf-8"?><test/>'; PRINT 'Success!' /* Msg 9402, Level 16, State 1, Line XXXXX XML parsing: line 1, character 38, unable to switch the encoding */ GO PRINT ''; PRINT 'VARCHAR / UTF-16:'; DECLARE @XML_VC_16 XML; SET @XML_VC_16 = '<?xml version="1.0" encoding="utf-16"?><test/>'; PRINT 'Success!' /* Msg 9402, Level 16, State 1, Line XXXXX XML parsing: line 1, character 38, unable to switch the encoding */ GO PRINT ''; PRINT 'NVARCHAR / UTF-16:'; DECLARE @XML_NVC_16 XML; SET @XML_NVC_16 = N'<?xml version="1.0" encoding="utf-16"?><test/>'; PRINT 'Success!' -- Success!
As you can see, when the input string is
NVARCHAR
, then the XML declaration can be included, but it needs to be "UTF-16". -
When the input string is
VARCHAR
then the XML declaration can be included, but it cannot be "UTF-16". It can, however, be any valid 8-bit encoding, in which case the bytes for that encoding will be converted into UTF-16, as shown below:DECLARE @XML XML; SET @XML = '<?xml version="1.0" encoding="utf-8"?><test attr="' + CHAR(0xF0) + CHAR(0x9F) + CHAR(0x98) + CHAR(0x8E) + '"/>'; SELECT @XML; -- <test attr="😎" /> SET @XML = '<?xml version="1.0" encoding="Windows-1255"?><test attr="' + CONVERT(VARCHAR(10), 0xF9ECE5ED) + '"/>'; SELECT @XML AS [XML from Windows-1255], CONVERT(VARCHAR(10), 0xF9ECE5ED) AS [Latin1_General / Windows-1252]; /* XML from Windows-1255 Latin1_General / Windows-1252 <test attr="שלום" /> ùìåí */
The first example specifies the 4-byte UTF-8 sequence for Smiling Face with Sunglasses and it get converted correctly.
The second example uses 4 bytes to represent 4 Hebrew letters making up the word "Shalom", which is converted correctly, and displayed correctly given that the "F9" byte, which is first, is theש
character, which is on the right-side of the word (since Hebrew is a right-to-left language). Yet those same 4 bytes display asùìåí
when selected directly since the default Collation for the current DB isLatin1_General_100_CS_AS_SC
.
veljkoz
Programming usually in .NET, with SQL Server as backend, but always open to new solutions. Interested in software architecture in general
Updated on July 09, 2022Comments
-
veljkoz almost 2 years
I'm trying to insert into XML column (SQL SERVER 2008 R2), but the server's complaining:
System.Data.SqlClient.SqlException (0x80131904):
XML parsing: line 1, character 39, unable to switch the encodingI found out that the XML column has to be UTF-16 in order for the insert to succeed.
The code I'm using is:
XmlSerializer serializer = new XmlSerializer(typeof(MyMessage)); StringWriter str = new StringWriter(); serializer.Serialize(str, message); string messageToLog = str.ToString();
How can I serialize object to be in UTF-8 string?
EDIT: Ok, sorry for the mixup - the string needs to be in UTF-8. You were right - it's UTF-16 by default, and if I try to insert in UTF-8 it passes. So the question is how to serialize into UTF-8.
Example
This causes errors while trying to insert into SQL Server:
<?xml version="1.0" encoding="utf-16"?> <MyMessage>Teno</MyMessage>
This doesn't:
<?xml version="1.0" encoding="utf-8"?> <MyMessage>Teno</MyMessage>
Update
I figured out when the SQL Server 2008 for its
Xml
column type needs utf-8, and when utf-16 inencoding
property of the xml specification you're trying to insert:When you want to add
utf-8
, then add parameters to SQL command like this:sqlcmd.Parameters.Add("ParamName", SqlDbType.VarChar).Value = xmlValueToAdd;
If you try to add the xmlValueToAdd with
encoding=utf-16
in the previous row it would produce errors in insert. Also, theVarChar
means that national characters aren't recognized (they turn out as question marks).To add utf-16 to db, either use
SqlDbType.NVarChar
orSqlDbType.Xml
in previous example, or just don't specify type at all:sqlcmd.Parameters.Add(new SqlParameter("ParamName", xmlValueToAdd));
-
Damien_The_Unbeliever over 13 yearsCan you not keep everything as XML, rather than converting it into a string in your application, only to have SQL Server try to convert it back into XML?
-
veljkoz over 13 yearsI receive object - I don't have XML yet, and that's what I need
-
ziesemer over 12 yearsFYI to readers - near duplicates: stackoverflow.com/questions/1564718/… and stackoverflow.com/questions/384974/…
-
ziesemer over 12 years@Damien_The_Unbeliever - Yes, you can! Please see the answer I just provided.
-
-
veljkoz over 13 yearsThis is it. It's the most flexible one
-
veljkoz over 13 yearsYou we're right - it seems that Sql was configured to accept only UTF-8 in xml columns. +1
-
Isak Savo over 13 yearsHmm, correct me if I'm wrong here, but all this code is doing is setting
encoding="utf-16"
in the top of the XML data. Thecontent
string is UTF-16 regardless of what encoding you use for yourXmlTextWriter
. -
Pedro over 13 yearsYes precisely. It's not a question if the string is UTF-8 or UTF-16, as you said previously it's always UTF-16. The question is to set the encoding="utf-16" or "utf-8".
-
NightShovel over 10 years@veljkoz - SQL Server cannot accept UTF-8 encoded XML values. The solution for me was to strip out the XML declaration, since it is not stored with the XML data anyway. See stackoverflow.com/a/9002485/895218.
-
Karthik D V over 10 yearsEvent I faced similar issue while inserting xml content to db. For ex:Insert Into TestData(Xml) Values ('<?xml version="1.0" encoding="UTF-8"?><Test/>'). This kind of statement used to fail and I was getting "unable to switch .." error. Later I simply prefied N to xml string like this : Insert Into TestData(Xml) Values (N'<?xml version="1.0" encoding="UTF-8"?><Test/>'). After this it started working !!!
-
The Red Pea over 8 yearsHello, ziesemer, is it possible to use this
SqlXml
approach as part of Entity Framework? stackoverflow.com/questions/32443571/… -
The Dag over 8 yearsIMHO, a better option is to use the serializer settings so that the XML declaration is omitted (or, if the settings allow, omit just the encoding attribute). It is senseless to declare the encoding in documents except when stored in a file, since when you're dealing with a text or some DOM model, the bytes have long ago been interpreted as text, which is all the encoding info is good for.
-
The Dag over 8 years"the DB doesn't need to know anything about character encodings". The logical implication is that something other than bits and bytes somehow get transported. Better to rephrase this or omit it - the point is that the "binary XML" format is more efficient.
-
Tony Wall over 8 yearsExcellent not only to deal with the problem of encoding switch but also the performance increase. In a way it is good we still have errors when not using it, to remind us that we should be doing it differently. Good you repeated this answer with further examples then :-)
-
Solomon Rutzky over 5 years+1 and please see my answer which is an addendum to this fine answer: stackoverflow.com/a/53620185/577765
-
Solomon Rutzky over 5 years@NightShovel (and others): Yes, the SQL Server
XML
datatype can accept UTF-8 encoded values (or even other 8-bit encodings), just as long as a) you send the value as eitherVarChar
orVarBinary
(notNVarchar
orXml
), and b) the string is truly encoded with the encoding you are claiming it is in the XML declaration. Please see my answer (on this page) for details and examples :-)