How do I convert wchar_t* to std::string?
Solution 1
You could just use wstring
and keep everything in Unicode
Solution 2
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
Solution 3
You can convert a wide char string to an ASCII string using the following function:
#include <locale>
#include <sstream>
#include <string>
std::string ToNarrow( const wchar_t *s, char dfault = '?',
const std::locale& loc = std::locale() )
{
std::ostringstream stm;
while( *s != L'\0' ) {
stm << std::use_facet< std::ctype<wchar_t> >( loc ).narrow( *s++, dfault );
}
return stm.str();
}
Be aware that this will just replace any wide character for which an equivalent ASCII character doesn't exist with the dfault
parameter; it doesn't convert from UTF-16 to UTF-8. If you want to convert to UTF-8 use a library such as ICU.
Solution 4
This is an old question, but if it's the case you're not really seeking conversions but rather using the TCHAR stuff from Mircosoft to be able to build both ASCII and Unicode, you could recall that std::string is really
typedef std::basic_string<char> string
So we could define our own typedef, say
#include <string>
namespace magic {
typedef std::basic_string<TCHAR> string;
}
Then you could use magic::string
with TCHAR
, LPCTSTR
, and so forth
Solution 5
It's rather disappointing that none of the answers given to this old question addresses the problem of converting wide strings into UTF-8 strings, which is important in non-English environments.
Here's an example code that works and may be used as a hint to construct custom converters. It is based on an example code from Example code in cppreference.com.
#include <iostream>
#include <clocale>
#include <string>
#include <cstdlib>
#include <array>
std::string convert(const std::wstring& wstr)
{
const int BUFF_SIZE = 7;
if (MB_CUR_MAX >= BUFF_SIZE) throw std::invalid_argument("BUFF_SIZE too small");
std::string result;
bool shifts = std::wctomb(nullptr, 0); // reset the conversion state
for (const wchar_t wc : wstr)
{
std::array<char, BUFF_SIZE> buffer;
const int ret = std::wctomb(buffer.data(), wc);
if (ret < 0) throw std::invalid_argument("inconvertible wide characters in the current locale");
buffer[ret] = '\0'; // make 'buffer' contain a C-style string
result = result + std::string(buffer.data());
}
return result;
}
int main()
{
auto loc = std::setlocale(LC_ALL, "en_US.utf8"); // UTF-8
if (loc == nullptr) throw std::logic_error("failed to set locale");
std::wstring wstr = L"aąß水𝄋-扫描-€𐍈\u00df\u6c34\U0001d10b";
std::cout << convert(wstr) << "\n";
}
This prints, as expected:
Explanation
- 7 seems to be the minimal secure value of the buffer size,
BUFF_SIZE
. This includes 4 as the maximum number of UTF-8 bytes encoding a single character; 2 for the possible "shift sequence", 1 for the trailing'\0'
. -
MB_CUR_MAX
is a run-time variable, sostatic_assert
is not usable here - Each wide character is translated into its
char
representation usingstd::wctomb
- This conversion makes sense only if the current locale allows multi-byte representations of a character
- For this to work, the application needs to set the proper locale.
en_US.utf8
seems to be sufficiently universal (available on most machines). In Linux, available locales can be queried in the console vialocale -a
command.
Critique of the most upvoted answer
The most upvoted answer,
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
works well only when the wide characters represent ASCII characters - but these are not what wide characters were designed for. In this solution, the converted string contains one char per each source wide char, ws.size() == test.size()
. Thus, it loses information from the original wstring and produces strings that cannot be interpreted as proper UTF-8 sequences. For example, on my machine the string resulting from this simplistic conversion of "ĄŚĆII" prints as "ZII", even though its size is 5 (and should be 8).
codefrog
Updated on January 07, 2022Comments
-
codefrog over 2 years
I changed my class to use std::string (based on the answer I got here but a function I have returns wchar_t *. How do I convert it to std::string?
I tried this:
std::string test = args.OptionArg();
but it says error C2440: 'initializing' : cannot convert from 'wchar_t *' to 'std::basic_string<_Elem,_Traits,_Ax>'
-
codefrog over 13 yearsand I'll still get a const char* if I use .c_str()? I have other functions that expect const char*
-
Steve Townsend over 13 yearsI'm going to make a guess that you are building your project in Unicode but really don't want that. If this is correct, you can change your project's properties to not build for Unicode and then you can use
string
. Check this in Project Properties, Configuration Properties, General, Character Set. You need this to sayUse Multibyte Character Set
to get rid of Unicode everywhere. -
codefrog over 13 yearsOriginally I planned to use Unicode for some parts but then I decided I'll worry about that later. At this point I'm only bothered to get the program to work. I'm using SimpleINI and SimpleOpt to load options and it uses Unicode. I'm also using the SDK of another software which also uses Unicode. Disabling Unicode all together might make even those parts of the code stop working.
-
Steve Townsend over 13 yearsSimpleIni docs indicate it uses the same conventions as Windows and so will work whichever way you build. For Unicode it uses a W suffix, for multi-byte charset it uses an A suffix, on function and class names. You should use the undecorated names (no A or W) and it will build in the right code depending on your project settings.
-
Praetorian over 13 yearsSince you're programming on Windows you probably should be using Unicode. The Windows API and NTFS natively support UTF-16, so building ASCII applications incur an aditional overhead where each function is doing string conversions for you.
-
Steve Townsend over 13 years@Praetorian - regardless of the correctness of that advice in the general case, path of least resistance is to use MBCS, since code is using
char*
elsewhere -
Praetorian over 13 years@Steve: Yes, of course, I wasn't disputing that. If the OP doesn't have access to the source code that uses
char *
then he should convert the entire project to MBCS. -
codefrog over 13 yearsI'm gonna try using wstring and see how it goes. Thanks for the answers.
-
Stephen about 10 yearsMany applications use utf-8 internally. Windows is a right pain because wchar_t isnt big enough and it doesnt really support utf-8 properly. This makes life difficult when you have (like me) a large codebase application which uses utf-8 internally. Mostly this works fine but its the interaction with some of the OS level functions that become annoying.
-
riv over 8 yearsHow is it an accepted answer if it doesn't even answer the question?
-
Ian over 7 yearsProvides the actual answer to the question!
-
Julian about 7 yearsI like this solution for its simplicity. However, a little explanation couldn't hurt. It leaves open the question of how the characters are actually converted. Is there an information loss or are the wide characters converted to unicode?
-
zett42 almost 7 yearsI don't know why this answer got so many upvotes, what it does is equivalent to
char c = static_cast<char>( wideChar )
for each character, so it obviously looses information if the wide-string characters are not in ASCII range. -
truthadjustr about 4 yearsMy hero! Thank you for directly providing the answer for the 99.9% of us.
-
j b about 3 years@zett42 isn't that going to be true of any method to convert
wchar_t
tostd::string
, since by definition it's a lossy conversion... -
zett42 about 3 years@jb Depends on the encoding of the
std::string
. E. g. when using UTF-8 there is no loss of information.