QString to unicode std::string
Solution 1
The below applies to Qt 5. Qt 4's behavior was different and, in practice, broken.
You need to choose:
Whether you want the 8-bit wide
std::string
or 16-bit widestd::wstring
, or some other type.What encoding is desired in your target string?
Internally, QString
stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChar
s.
Common cases:
-
Locally encoded 8-bit
std::string
(as in: system locale):std::string(str.toLocal8Bit().constData())
-
UTF-8 encoded 8-bit
std::string
:str.toStdString()
This is equivalent to:
std::string(str.toUtf8().constData())
-
UTF-16 or UCS-4 encoded
std::wstring
, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width ofwchar_t
.str.toStdWString()
-
U16 or U32 strings of C++11 - from Qt 5.5 onwards:
str.toStdU16String() str.toStdU32String()
-
UTF-16 encoded 16-bit
std::u16string
- this hack is only needed up to Qt 5.4:std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
This encoding does not include byte order marks (BOMs).
It's easy to prepend BOMs to the QString
itself before converting it:
QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
src.size()};
#else
auto dst = src.toStdU16String();
If you expect the strings to be large, you can skip one copy:
const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
src.size()+1);
In both cases, dst
is now portable to systems with either endianness.
Solution 2
Use this:
QString Widen(const std::string &stdStr)
{
return QString::fromUtf8(stdStr.data(), stdStr.size());
}
std::string Narrow(const QString &qtStr)
{
QByteArray utf8 = qtStr.toUtf8();
return std::string(utf8.data(), utf8.size());
}
In all cases you should have utf8 in std::string.
Related videos on Youtube

Comments
-
Oleg Andriyanov 7 months
I know there is plenty of information about converting
QString
tochar*
, but I still need some clarification in this question.Qt provides
QTextCodec
s to convertQString
(which internally stores characters in unicode) toQByteArray
, allowing me to retrievechar*
which represents the string in some non-unicode encoding. But what should I do when I want to get a unicodeQByteArray
?QTextCodec* codec = QTextCodec::codecForName("UTF-8"); QString qstr = codec->toUnicode("Юникод"); std::string stdstr(reinterpret_cast<const char*>(qstr.constData()), qstr.size() * 2 ); // * 2 since unicode character is twice longer than char qDebug() << QString(reinterpret_cast<const QChar*>(stdstr.c_str()), stdstr.size() / 2); // same
The above code prints "Юникод" as I've expected. But I'd like to know if that is the right way to get to the unicode
char*
of theQString
. In particular,reinterpret_cast
s and size arithmetics in this technique looks pretty ugly.-
Kuba hasn't forgotten Monica over 8 years"you mean UTF8 and Unicode are equal" No. Your use of the word Unicode is wrong. Unicode is not an encoding, it's a standard, so talking of a "Unicode std::string" doesn't mean anything. A string by itself can't be unicode compliant. An
std::string
will have a particular "character" type (usually either 8 or 16 bits wide), and it will have a particular encoding (UCS-2 or UTF-16 for 16 bit characters, usually). The big difference between UCS-2 and UTF-16 is that UCS-2 is fixed-width: one code point per "character". In UTF-16, there may be multiple "characters" per code point. -
Kuba hasn't forgotten Monica over 8 yearsThe phrase "unicode QByteArray" is meaningless. It is equivalent to saying "wakalixes QByteArray". A byte array can carry text data in some 8-bit encoding, such as Latin1 (ISO/IEC 8859-1), or UTF-8, etc. If you want an 8-bit encoded byte array as a representation of a string, you need to know what encoding is expected by the user of such an array. Only then can you decide how to encode the string.
-
Kuba hasn't forgotten Monica over 8 yearsPlease edit your question's title to indicate what encoding is desired in the
std::string
, and whether the string is 8- or 16-bits wide. -
Kuba hasn't forgotten Monica over 8 yearsOK, presuming that it is indeed
std::string
and notstd::wstring
, the string is 8 bit wide, but the encoding question still remains.
-
-
Kuba hasn't forgotten Monica over 8 yearsThere is no such thing as a "unicode byte array" - please stop using this term, it confuses everyone. Unicode is a standard, not an encoding. There's UTF-16 and UCS-2, and the latter is what
QString
is internally encoded as. UCS-2 is a subset of UTF-16 for code points 0-0xFFFF. Since aQString
can't carry code points outside of that range, you don't need to do anything special to get UTF-16 out of aQString
. Just use the string'sconstData()
. -
Nejat over 8 years@KubaOber Using constData() also gets you the BOM at the begging which is a mess. Using the mentioned approach you can get the QByteArray related to string and also you can use different encoding options.
-
Kuba hasn't forgotten Monica over 8 yearsAre you sure that
QString
stores the embedded BOM? -
Nejat over 8 yearsYeah definitely. You can see stackoverflow.com/questions/3602548/…
-
Kuba hasn't forgotten Monica over 8 yearsThe first answer in your link seems to contradict you.
-
Kuba hasn't forgotten Monica over 8 yearsIn fact, I've just checked, and
QString
does not carry an embedded BOM. It'd be a waste of space. This code would dump out the BOM; it doesn't:QString str1(QStringLiteral("A")); const QChar * p = str1.constData(); while (p->unicode()) qDebug() << *p++;
-
Len almost 5 yearsWhy is
stdStr.size()
necessary when calling fromUtf8? Does that result in storing the terminating null in the QString? Otherwise, it appearsfromUtf8
defaults to reading up to the terminating null...