Different behaviour of ctypes c_char_p?

11,872

c_char_p is a subclass of _SimpleCData, with _type_ == 'z'. The __init__ method calls the type's setfunc, which for simple type 'z' is z_set.

In Python 2, the z_set function (2.7.7) is written to handle both str and unicode strings. Prior to Python 3, str is an 8-bit string. CPython 2.x str internally uses a C null-terminated string (i.e. an array of bytes terminated by \0), for which z_set can call PyString_AS_STRING (i.e. get a pointer to the internal buffer of the str object). A unicode string needs to first be encoded to a byte string. z_set handles this encoding automatically and keeps a reference to the encoded string in the _objects attribute.

>>> c = u'spam'
>>> a = c_char_p(c)
>>> a._objects
'spam'
>>> type(a._objects)
<type 'str'>

On Windows, the default ctypes string encoding is 'mbcs', with error handling set to 'ignore'. On all other platforms the default encoding is 'ascii', with 'strict' error handling. To modify the default, call ctypes.set_conversion_mode. For example, set_conversion_mode('utf-8', 'strict').

In Python 3, the z_set function (3.4.1) does not automatically convert str (now Unicode) to bytes. The paradigm shifted in Python 3 to strictly divide character strings from binary data. The ctypes default conversions were removed, as was the function set_conversion_mode. You have to pass c_char_p a bytes object (e.g. b'spam' or 'spam'.encode('utf-8')). In CPython 3.x, z_set calls the C-API function PyBytes_AsString to get a pointer to the internal buffer of the bytes object.

Note that if the C function modifies the string, then you need to instead use create_string_buffer to create a c_char array. Look for a parameter to be typed as const to know that it's safe to use c_char_p.

Share:
11,872
Sagar Masuti
Author by

Sagar Masuti

Solving problems is fun

Updated on July 24, 2022

Comments

  • Sagar Masuti
    Sagar Masuti almost 2 years

    I am confused with this behaviour of different versions of python and dont understand why ?

    Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> c="hello"
    >>> a=ctypes.c_char_p(c)
    >>> print(a.value) 
    hello
    
    Python 3.3.5 (default, Mar 11 2014, 15:08:59) 
    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> c="hello" 
    >>> a=ctypes.c_char_p(c)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: bytes or integer address expected instead of str instance
    

    One works while the other gives me an error. Which one is correct ?

    If both of them are correct, how can i achieve the same behaviour as 2.7 in 3.3.5 ? I want to pass the char pointer to C from python.

  • Sagar Masuti
    Sagar Masuti almost 10 years
    Thanks for your detailed answer.
  • phyatt
    phyatt almost 5 years
    In summary, call ctypes.c_char_p(my_string.encode('utf-8')) to get it to work in Python 3.x.