QString to unicode std::string
Solution 1
The below applies to Qt 5. Qt 4's behavior was different and, in practice, broken.
You need to choose:
Whether you want the 8-bit wide
std::stringor 16-bit widestd::wstring, or some other type.What encoding is desired in your target string?
Internally, QString stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChars.
Common cases:
-
Locally encoded 8-bit
std::string(as in: system locale):std::string(str.toLocal8Bit().constData()) -
UTF-8 encoded 8-bit
std::string:str.toStdString()This is equivalent to:
std::string(str.toUtf8().constData()) -
UTF-16 or UCS-4 encoded
std::wstring, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width ofwchar_t.str.toStdWString() -
U16 or U32 strings of C++11 - from Qt 5.5 onwards:
str.toStdU16String() str.toStdU32String() -
UTF-16 encoded 16-bit
std::u16string- this hack is only needed up to Qt 5.4:std::u16string(reinterpret_cast<const char16_t*>(str.constData()))This encoding does not include byte order marks (BOMs).
It's easy to prepend BOMs to the QString itself before converting it:
QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
src.size()};
#else
auto dst = src.toStdU16String();
If you expect the strings to be large, you can skip one copy:
const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
src.size()+1);
In both cases, dst is now portable to systems with either endianness.
Solution 2
Use this:
QString Widen(const std::string &stdStr)
{
return QString::fromUtf8(stdStr.data(), stdStr.size());
}
std::string Narrow(const QString &qtStr)
{
QByteArray utf8 = qtStr.toUtf8();
return std::string(utf8.data(), utf8.size());
}
In all cases you should have utf8 in std::string.
Related videos on Youtube
Comments
-
Oleg Andriyanov 7 monthsI know there is plenty of information about converting
QStringtochar*, but I still need some clarification in this question.Qt provides
QTextCodecs to convertQString(which internally stores characters in unicode) toQByteArray, allowing me to retrievechar*which represents the string in some non-unicode encoding. But what should I do when I want to get a unicodeQByteArray?QTextCodec* codec = QTextCodec::codecForName("UTF-8"); QString qstr = codec->toUnicode("Юникод"); std::string stdstr(reinterpret_cast<const char*>(qstr.constData()), qstr.size() * 2 ); // * 2 since unicode character is twice longer than char qDebug() << QString(reinterpret_cast<const QChar*>(stdstr.c_str()), stdstr.size() / 2); // sameThe above code prints "Юникод" as I've expected. But I'd like to know if that is the right way to get to the unicode
char*of theQString. In particular,reinterpret_casts and size arithmetics in this technique looks pretty ugly.-
Kuba hasn't forgotten Monica over 8 years"you mean UTF8 and Unicode are equal" No. Your use of the word Unicode is wrong. Unicode is not an encoding, it's a standard, so talking of a "Unicode std::string" doesn't mean anything. A string by itself can't be unicode compliant. Anstd::stringwill have a particular "character" type (usually either 8 or 16 bits wide), and it will have a particular encoding (UCS-2 or UTF-16 for 16 bit characters, usually). The big difference between UCS-2 and UTF-16 is that UCS-2 is fixed-width: one code point per "character". In UTF-16, there may be multiple "characters" per code point. -
Kuba hasn't forgotten Monica over 8 yearsThe phrase "unicode QByteArray" is meaningless. It is equivalent to saying "wakalixes QByteArray". A byte array can carry text data in some 8-bit encoding, such as Latin1 (ISO/IEC 8859-1), or UTF-8, etc. If you want an 8-bit encoded byte array as a representation of a string, you need to know what encoding is expected by the user of such an array. Only then can you decide how to encode the string. -
Kuba hasn't forgotten Monica over 8 yearsPlease edit your question's title to indicate what encoding is desired in thestd::string, and whether the string is 8- or 16-bits wide. -
Kuba hasn't forgotten Monica over 8 yearsOK, presuming that it is indeedstd::stringand notstd::wstring, the string is 8 bit wide, but the encoding question still remains.
-
-
Kuba hasn't forgotten Monica over 8 yearsThere is no such thing as a "unicode byte array" - please stop using this term, it confuses everyone. Unicode is a standard, not an encoding. There's UTF-16 and UCS-2, and the latter is whatQStringis internally encoded as. UCS-2 is a subset of UTF-16 for code points 0-0xFFFF. Since aQStringcan't carry code points outside of that range, you don't need to do anything special to get UTF-16 out of aQString. Just use the string'sconstData(). -
Nejat over 8 years@KubaOber Using constData() also gets you the BOM at the begging which is a mess. Using the mentioned approach you can get the QByteArray related to string and also you can use different encoding options. -
Kuba hasn't forgotten Monica over 8 yearsAre you sure thatQStringstores the embedded BOM? -
Nejat over 8 yearsYeah definitely. You can see stackoverflow.com/questions/3602548/… -
Kuba hasn't forgotten Monica over 8 yearsThe first answer in your link seems to contradict you. -
Kuba hasn't forgotten Monica over 8 yearsIn fact, I've just checked, andQStringdoes not carry an embedded BOM. It'd be a waste of space. This code would dump out the BOM; it doesn't:QString str1(QStringLiteral("A")); const QChar * p = str1.constData(); while (p->unicode()) qDebug() << *p++; -
Len almost 5 yearsWhy isstdStr.size()necessary when calling fromUtf8? Does that result in storing the terminating null in the QString? Otherwise, it appearsfromUtf8defaults to reading up to the terminating null...