How to convert char* to wchar_t*?

207,377

Solution 1

Use a std::wstring instead of a C99 variable length array. The current standard guarantees a contiguous buffer for std::basic_string. E.g.,

std::wstring wc( cSize, L'#' );
mbstowcs( &wc[0], c, cSize );

C++ does not support C99 variable length arrays, and so if you compiled your code as pure C++, it would not even compile.

With that change your function return type should also be std::wstring.

Remember to set relevant locale in main.

E.g., setlocale( LC_ALL, "" ).

Solution 2

In your example, wc is a local variable which will be deallocated when the function call ends. This puts you into undefined behavior territory.

The simple fix is this:

const wchar_t *GetWC(const char *c)
{
    const size_t cSize = strlen(c)+1;
    wchar_t* wc = new wchar_t[cSize];
    mbstowcs (wc, c, cSize);

    return wc;
}

Note that the calling code will then have to deallocate this memory, otherwise you will have a memory leak.

Solution 3

const char* text_char = "example of mbstowcs";
size_t length = strlen(text_char );

Example of usage "mbstowcs"

std::wstring text_wchar(length, L'#');

//#pragma warning (disable : 4996)
// Or add to the preprocessor: _CRT_SECURE_NO_WARNINGS
mbstowcs(&text_wchar[0], text_char , length);

Example of usage "mbstowcs_s"

Microsoft suggest to use "mbstowcs_s" instead of "mbstowcs".

Links:

Mbstowcs example

mbstowcs_s, _mbstowcs_s_l

wchar_t text_wchar[30];

mbstowcs_s(&length, text_wchar, text_char, length);

Solution 4

You're returning the address of a local variable allocated on the stack. When your function returns, the storage for all local variables (such as wc) is deallocated and is subject to being immediately overwritten by something else.

To fix this, you can pass the size of the buffer to GetWC, but then you've got pretty much the same interface as mbstowcs itself. Or, you could allocate a new buffer inside GetWC and return a pointer to that, leaving it up to the caller to deallocate the buffer.

Solution 5

The question has several problems, but so do some of the answers. The idea of returning a pointer to allocated memory "and leaving it up to the caller to de-allocate" is asking for trouble. As a rule the best pattern is always to allocate and de-allocate within the same function. For example, something like:

wchar_t* buffer = new wchar_t[get_wcb_size(str)];
mbstowcs(buffer, str, get_wcb_size(str) + 1);
...
delete[] buffer;

In general, this requires two functions, one the caller calls to find out how much memory to allocate and a second to initialize or fill the allocated memory. Unfortunately, the basic idea of using a function to return a "new" object is problematic -- not inherently, but because of the C++ inheritance of C memory handling. Using C++ and STL's strings/wstrings/strstreams is a better solution, but I felt the memory allocation thing needed to be better addressed.

Share:
207,377
AutoBotAM
Author by

AutoBotAM

Updated on January 24, 2022

Comments

  • AutoBotAM
    AutoBotAM over 2 years

    I've tried implementing a function like this, but unfortunately it doesn't work:

    const wchar_t *GetWC(const char *c)
    {
        const size_t cSize = strlen(c)+1;
        wchar_t wc[cSize];
        mbstowcs (wc, c, cSize);
    
        return wc;
    }
    

    My main goal here is to be able to integrate normal char strings in a Unicode application. Any advice you guys can offer is greatly appreciated.

  • Kerrek SB
    Kerrek SB over 12 years
    Won't you require C++11 to guarantee that the string buffer is stored contiguously?
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    The C++ way is to not do raw new (in general). E.g. std::wstring is the natural result type here. At least when you don't have anything better. Also, with the code regarded as C++ he is not returning anything. The code won't compile as C++.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    As stated in the answer, the current standard guarantees that, yes. The proposal was adopted at the Lillehammer meeting in April 2005.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    it's not nice to teach novices bad practices like using raw new. you should at the very least mention what that entails. and alternatives.
  • AutoBotAM
    AutoBotAM over 12 years
    I added your code snippet and the function seems to work now! Although I didn't have to return a wstring as I just used the c_str() member function and returned that. It seems I also didn't need to call setlocale as the defaults seem to suffice for now.
  • Greg Hewgill
    Greg Hewgill over 12 years
    @AutoBotAM: By returning wstring.c_str(), you will have the same problem all over again. When a function exits, all local variables are destroyed, including those of type wstring. The return value of c_str() is only valid for the lifetime of its corresponding wstring object. Although your code might look like it runs correctly, it is accessing memory that has been freed and it will mysteriously fail some day for reasons that are not obvious to you at the time.
  • AutoBotAM
    AutoBotAM over 12 years
    It did look like it worked, but I suppose memory issues like this can be deceiving as you say. I'll probably store the returned string in a global buffer or array of some sort so that the memory is always consistent. I just don't like the use of the new() operator and leaving it up to the user to delete it; that's very welcome to memory leaks, especially if someone else is using my function.
  • Andrew Shepherd
    Andrew Shepherd over 9 years
    @Alexis Wilke - Care to explain more?
  • Alexis Wilke
    Alexis Wilke over 9 years
    strlen() on an mbstring does not return the size of the wstring. You would need to do cSize = mbstowcs(NULL, c, 0) + 1; to get the correct size.
  • Andrew Shepherd
    Andrew Shepherd over 9 years
    @Alexis Wilke - Why do you think that c is an mbstring?
  • Alexis Wilke
    Alexis Wilke over 9 years
    Why do you use mbstowcs() otherwise?! If the locale changes on your, multiple bytes may represent a single UTF-16 character and other sequences may represent two entries in UTF-16 (some Chinese and such is encoded using 4 bytes per character.)
  • a paid nerd
    a paid nerd about 9 years
    If you use std::unique_ptr<wchar_t[]> wa(new wchar_t[cSize]) you won't have to manually delete it later.
  • Shlublu
    Shlublu almost 8 years
    Now mbstowcs_s(&outSize, wc, cSize, c, cSize - 1); should be used instead of mbstowcs (wc, c, cSize); but it works well!
  • Jan Hohenheim
    Jan Hohenheim over 7 years
    @AutoBotAM better yet, just return the wstring by value and let RAII do it's thing
  • lalitm
    lalitm over 6 years
    call to mbstowcs_s above is missing an arg?
  • masiton
    masiton almost 4 years
    So the solution proposed here is to introduce memory leak to the code without any explanation on how to fix it.
  • masiton
    masiton almost 4 years
    The question clearly states that target type is wchar_t, not wstring. Why si this offtopic answer marked as a solution?
  • Mgamerz
    Mgamerz about 2 years
    get_wcb_size doesn't seem to be a thing. If you google it, this answer is the only result.