How to convert char* to wchar_t*?
Solution 1
Use a std::wstring
instead of a C99 variable length array. The current standard guarantees a contiguous buffer for std::basic_string
. E.g.,
std::wstring wc( cSize, L'#' );
mbstowcs( &wc[0], c, cSize );
C++ does not support C99 variable length arrays, and so if you compiled your code as pure C++, it would not even compile.
With that change your function return type should also be std::wstring
.
Remember to set relevant locale in main
.
E.g., setlocale( LC_ALL, "" )
.
Solution 2
In your example, wc
is a local variable which will be deallocated when the function call ends. This puts you into undefined behavior territory.
The simple fix is this:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t* wc = new wchar_t[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
Note that the calling code will then have to deallocate this memory, otherwise you will have a memory leak.
Solution 3
const char* text_char = "example of mbstowcs";
size_t length = strlen(text_char );
Example of usage "mbstowcs"
std::wstring text_wchar(length, L'#');
//#pragma warning (disable : 4996)
// Or add to the preprocessor: _CRT_SECURE_NO_WARNINGS
mbstowcs(&text_wchar[0], text_char , length);
Example of usage "mbstowcs_s"
Microsoft suggest to use "mbstowcs_s" instead of "mbstowcs".
Links:
wchar_t text_wchar[30];
mbstowcs_s(&length, text_wchar, text_char, length);
Solution 4
You're returning the address of a local variable allocated on the stack. When your function returns, the storage for all local variables (such as wc
) is deallocated and is subject to being immediately overwritten by something else.
To fix this, you can pass the size of the buffer to GetWC
, but then you've got pretty much the same interface as mbstowcs
itself. Or, you could allocate a new buffer inside GetWC
and return a pointer to that, leaving it up to the caller to deallocate the buffer.
Solution 5
The question has several problems, but so do some of the answers. The idea of returning a pointer to allocated memory "and leaving it up to the caller to de-allocate" is asking for trouble. As a rule the best pattern is always to allocate and de-allocate within the same function. For example, something like:
wchar_t* buffer = new wchar_t[get_wcb_size(str)];
mbstowcs(buffer, str, get_wcb_size(str) + 1);
...
delete[] buffer;
In general, this requires two functions, one the caller calls to find out how much memory to allocate and a second to initialize or fill the allocated memory. Unfortunately, the basic idea of using a function to return a "new" object is problematic -- not inherently, but because of the C++ inheritance of C memory handling. Using C++ and STL's strings/wstrings/strstreams is a better solution, but I felt the memory allocation thing needed to be better addressed.
AutoBotAM
Updated on January 24, 2022Comments
-
AutoBotAM over 2 years
I've tried implementing a function like this, but unfortunately it doesn't work:
const wchar_t *GetWC(const char *c) { const size_t cSize = strlen(c)+1; wchar_t wc[cSize]; mbstowcs (wc, c, cSize); return wc; }
My main goal here is to be able to integrate normal char strings in a Unicode application. Any advice you guys can offer is greatly appreciated.
-
Kerrek SB over 12 yearsWon't you require C++11 to guarantee that the string buffer is stored contiguously?
-
Cheers and hth. - Alf over 12 yearsThe C++ way is to not do raw
new
(in general). E.g.std::wstring
is the natural result type here. At least when you don't have anything better. Also, with the code regarded as C++ he is not returning anything. The code won't compile as C++. -
Cheers and hth. - Alf over 12 yearsAs stated in the answer, the current standard guarantees that, yes. The proposal was adopted at the Lillehammer meeting in April 2005.
-
Cheers and hth. - Alf over 12 yearsit's not nice to teach novices bad practices like using raw
new
. you should at the very least mention what that entails. and alternatives. -
AutoBotAM over 12 yearsI added your code snippet and the function seems to work now! Although I didn't have to return a
wstring
as I just used thec_str()
member function and returned that. It seems I also didn't need to callsetlocale
as the defaults seem to suffice for now. -
Greg Hewgill over 12 years@AutoBotAM: By returning
wstring.c_str()
, you will have the same problem all over again. When a function exits, all local variables are destroyed, including those of typewstring
. The return value ofc_str()
is only valid for the lifetime of its correspondingwstring
object. Although your code might look like it runs correctly, it is accessing memory that has been freed and it will mysteriously fail some day for reasons that are not obvious to you at the time. -
AutoBotAM over 12 yearsIt did look like it worked, but I suppose memory issues like this can be deceiving as you say. I'll probably store the returned string in a global buffer or array of some sort so that the memory is always consistent. I just don't like the use of the new() operator and leaving it up to the user to delete it; that's very welcome to memory leaks, especially if someone else is using my function.
-
Andrew Shepherd over 9 years@Alexis Wilke - Care to explain more?
-
Alexis Wilke over 9 years
strlen()
on an mbstring does not return the size of the wstring. You would need to docSize = mbstowcs(NULL, c, 0) + 1;
to get the correct size. -
Andrew Shepherd over 9 years@Alexis Wilke - Why do you think that
c
is an mbstring? -
Alexis Wilke over 9 yearsWhy do you use
mbstowcs()
otherwise?! If the locale changes on your, multiple bytes may represent a single UTF-16 character and other sequences may represent two entries in UTF-16 (some Chinese and such is encoded using 4 bytes per character.) -
a paid nerd about 9 yearsIf you use
std::unique_ptr<wchar_t[]> wa(new wchar_t[cSize])
you won't have to manually delete it later. -
Shlublu almost 8 yearsNow
mbstowcs_s(&outSize, wc, cSize, c, cSize - 1);
should be used instead ofmbstowcs (wc, c, cSize);
but it works well! -
Jan Hohenheim over 7 years@AutoBotAM better yet, just return the
wstring
by value and let RAII do it's thing -
lalitm over 6 yearscall to mbstowcs_s above is missing an arg?
-
masiton almost 4 yearsSo the solution proposed here is to introduce memory leak to the code without any explanation on how to fix it.
-
masiton almost 4 yearsThe question clearly states that target type is
wchar_t
, notwstring
. Why si this offtopic answer marked as a solution? -
Mgamerz about 2 years
get_wcb_size
doesn't seem to be a thing. If you google it, this answer is the only result.