ctypes return a string from c function

21,091

Solution 1

In hello.c you return a local array. You have to return a pointer to an array, which has to be dynamically allocated using malloc.

char* hello(char* name)
{ 
    char hello[] = "Hello ";
    char excla[] = "!\n";
    char *greeting = malloc ( sizeof(char) * ( strlen(name) + strlen(hello) + strlen(excla) + 1 ) );
    if( greeting == NULL) exit(1);
    strcpy( greeting , hello);
    strcat(greeting, name);
    strcat(greeting, excla);
    return greeting;
}

Solution 2

Your problem is that greeting was allocated on the stack, but the stack is destroyed when the function returns. You could allocate the memory dynamically:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

const char* hello(char* name) {
    char* greeting = malloc(100);
    snprintf("Hello, %s!\n", 100, name)
    printf("%s\n", greeting);
    return greeting;
}

But that's only part of the battle because now you have a memory leak. You could plug that with another ctypes call to free().

...or a much better approach is to read up on the official C binding to python (python 2.x at http://docs.python.org/2/c-api/ and python 3.x at http://docs.python.org/3/c-api/). Have your C function create a python string object and hand that back. It will be garbage collected by python automatically. Since you are writing the C side, you don't have to play the ctypes game.

...edit..

I didn't compile and test, but I think this .py would work:

import ctypes

# define the interface
hello = ctypes.cdll.LoadLibrary('./hello.so')
# find lib on linux or windows
libc = ctypes.CDLL(ctypes.util.find_library('c'))
# declare the functions we use
hello.hello.argtypes = (ctypes.c_char_p,)
hello.hello.restype = ctypes.c_char_p
libc.free.argtypes = (ctypes.c_void_p,)

# wrap hello to make sure the free is done
def hello(name):
    _result = hello.hello(name)
    result = _result.value
    libc.free(_result)
    return result

# do the deed
print hello("Frank")

Solution 3

I ran into this same problem today and found you must override the default return type (int) by setting restype on the method. See Return types in the ctype doc here.

import ctypes
hello = ctypes.cdll.LoadLibrary('./hello.so')
name = "Frank"
c_name = ctypes.c_char_p(name)
hello.hello.restype = ctypes.c_char_p # override the default return type (int)
foo = hello.hello(c_name)
print c_name.value
print ctypes.c_char_p(foo).value

Solution 4

Here's what happens. And why it's breaking. When hello() is called, the C stack pointer is moved up, making room for any memory needed by your function. Along with some function call overhead, all of your function locals are managed there. So that static char greeting[100], means that 100 bytes of the increased stack are for that string. You than use some functions that manipulate that memory. At the you place a pointer on the stack to the greeting memory. And then you return from the call, at which point, the stack pointer is retracted back to it's original before call position. So those 100 bytes that were on the stack for the duration of your call, are essentially up for grabs again as the stack is further manipulated. Including the address field which pointed to that value and that you returned. At that point, who knows what happens to it, but it's likely set to zero or some other value. And when you try to access it as if it were still viable memory, you get a segfault.

To get around, you need to manage that memory differently somehow. You can have your function allocate the memory on the heap, but you'll need to make sure it gets free()'ed at a later date, by your binding. OR, you can write your function so that the binding language passes it a glump of memory to be used.

Solution 5

I also ran into the same problem but used a different approach. I was suppose to find a string in a list of strings matchin a certain value.

Basically I initalized a char array with the size of longest string in my list. Then passed that as an argument to my function to hold the corresponding value.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void find_gline(char **ganal_lines, /*line array*/
                size_t size,        /*array size*/
                char *idnb,         /* id number for check */
                char *resline) {
  /*Iterates over lines and finds the one that contains idnb
    then affects the result to the resline*/
  for (size_t i = 0; i < size; i++) {
    char *line = ganal_lines[i];
    if (strstr(line, idnb) != NULL) {
      size_t llen = strlen(line);
      for (size_t k = 0; k < llen; k++) {
        resline[k] = line[k];
      }
      return;
    }
  }
  return;
}

This function was wrapped by the corresponding python function:



def find_gline_wrap(lines: list, arg: str, cdll):
    ""
    # set arg types
    mlen = maxlen(lines) # gives the length of the longest string in string list
    linelen = len(lines)
    line_array = ctypes.c_char_p * linelen

    cdll.find_gline.argtypes = [
        line_array,
        ctypes.c_size_t,
        ctypes.c_char_p,
        ctypes.c_char_p,
    ]
    #
    argbyte = bytes(arg, "utf-8")

    resbyte = bytes("", "utf-8")

    ganal_lines = line_array(*lines)
    size = ctypes.c_size_t(linelen)
    idnb = ctypes.c_char_p(argbyte)
    resline = ctypes.c_char_p(resbyte * mlen)
    pdb.set_trace()
    result = cdll.find_gline(ganal_lines, size, idnb, resline)
    # getting rid of null char at the end
    result = resline.value[:-1].decode("utf-8")
    return result
Share:
21,091
Thane Brimhall
Author by

Thane Brimhall

I'm the Chief Product Officer at Seek. I love Python 3, Django, REST, and PostgreSQL.

Updated on March 22, 2020

Comments

  • Thane Brimhall
    Thane Brimhall about 4 years

    I'm a Python veteran, but haven't dabbled much in C. After half a day of not finding anything on the internet that works for me, I thought I would ask here and get the help I need.

    What I want to do is write a simple C function that accepts a string and returns a different string. I plan to bind this function in several languages (Java, Obj-C, Python, etc.) so I think it has to be pure C?

    Here's what I have so far. Notice I get a segfault when trying to retrieve the value in Python.

    hello.c

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    
    const char* hello(char* name) {
        static char greeting[100] = "Hello, ";
        strcat(greeting, name);
        strcat(greeting, "!\n");
        printf("%s\n", greeting);
        return greeting;
    }
    

    main.py

    import ctypes
    hello = ctypes.cdll.LoadLibrary('./hello.so')
    name = "Frank"
    c_name = ctypes.c_char_p(name)
    foo = hello.hello(c_name)
    print c_name.value # this comes back fine
    print ctypes.c_char_p(foo).value # segfault
    

    I've read that the segfault is caused by C releasing the memory that was initially allocated for the returned string. Maybe I'm just barking up the wrong tree?

    What's the proper way to accomplish what I want?

    • David Heffernan
      David Heffernan about 11 years
      You need to set foo.restype appropriately. Do you really want to use static? Not threadsafe. Wouldn't you be better allocating memory in Python and letting the C code populate it with content? Or allocate in the C code, and export a deallocator too.
    • Fred Foo
      Fred Foo about 11 years
      You should probably return a copy of the string; use strdup or malloc for that. But really, if you want to do this kind of things in C, then invest in a C book. C is quite different from higher-level languages such as Python.
    • Admin
      Admin about 11 years
      Aside from the problem you describe, your buffer is static, so there's only one for all calls, so the next call would change what the first return value points at. Keeping it local and not static means its lifetime ends when the function returns, which makes it unsuitable. That's not even touching on the buffer overflow vulnerability!
    • Thane Brimhall
      Thane Brimhall about 11 years
      Heh, obviously a C noob here. :) If I remove static gcc gives me a warning. What's the proper way to allocate the memory for return? I'm just looking for something safe and straightforward.
    • Admin
      Admin about 11 years
      There is little safe or straightforward in C ;-) At least not if you work with a Python mindset. Read a good C book. Reading existing questions and answers here on Stackoverflow works in a pinch but I wouldn't bet on it. (Btw, gcc gives a warning for the very reason I hinted at: It's incorrect, you're returning the address of something that doesn't exist any more.)
  • Thane Brimhall
    Thane Brimhall about 11 years
    Very nice! Thank you. A followup question on this answer: Since we're allocating the memory here, where/when is it deallocated? If I call this function 10k times, will I have an awful leak?
  • Geesh_SO
    Geesh_SO about 11 years
    Beat me to it. This should do the job (unless there is any other unforeseen problems). The reason it didn't work before was that the by creating the string like you did, it was allocated on the stack and was thus lost once the function exited. The solution instead uses malloc to allocate on the heap, and returns to Python the location where to find the string.
  • Admin
    Admin about 11 years
    @ThaneBrimhall Of course for every malloc you need to make a free
  • Thane Brimhall
    Thane Brimhall about 11 years
    Excellent explanation of how it all works. How would I deallocate the memory once I use the result in my binding?
  • Thane Brimhall
    Thane Brimhall about 11 years
    I suppose I'd have to free it in the Python binding? Or where else would I do that?
  • Admin
    Admin about 11 years
    I don't know python but according to the rules set by your question, you would have to make a c function and pass the char* pointer to it.
  • Thane Brimhall
    Thane Brimhall about 11 years
    Perfect, thank you! See this question for one way to do it.
  • Warren Weckesser
    Warren Weckesser about 11 years
    The code should also check for malloc failing, and deal with it appropriately.
  • Thane Brimhall
    Thane Brimhall about 11 years
    I can't do the "much better approach" you recommended (return a Python object) because I need to bind this function in multiple languages.
  • Admin
    Admin about 11 years
    More importantly, it should check the length. malloc rarely fails and if it does you get a crash. On the other hand, it's very easy to cause a buffer overflow with this code (or OP's code, to be fair). This isn't the nineties.
  • Admin
    Admin about 11 years
    @delnan Of course it should do both things , but i wasn't writing the perfect function but an example how to do it in the first place.
  • Admin
    Admin about 11 years
    Still, this is such a glaring problem, and these bugs have such a horrible history, that I would rather see a huge warning (or just fix it, strlen should help). I actually disagree with the malloc check proposed above, and it triggered me to comment at all.
  • Travis Griggs
    Travis Griggs about 11 years
    free(pointer). That's the opposite of malloc and friends. You'll have to provide a binding for that, or hope that you have one already, most languages that bind to C have some mechanism for doing that.
  • Admin
    Admin about 11 years
    Looks good to me, modulo stylistic issues that go beyond nitpicking. But I don't do this on a daily basis, so don't take it as a guarantee. (And again, I for one don't think the null check is worth the line of code it takes. But this is subjective.)
  • tdelaney
    tdelaney about 11 years
    Okay, that can be a problem! Another option is SWIG, which can bind several languages (swig.org/compat.html#SupportedLanguages). I use ctypes from time to time, but it can be unwieldy when the interface is complex.
  • tdelaney
    tdelaney about 11 years
    I added the python code to the example - its not tested but looks right to me (lol).
  • Pavel Minaev
    Pavel Minaev about 4 years
    static variables are not allocated on the stack, and remain after the function returns, so this is wrong.
  • h0r53
    h0r53 over 3 years
    Any ideas how to find free on Windows? libc doesn't exist and util.find_library('c') returns None.
  • h0r53
    h0r53 over 3 years
    As a workaround I just defined my own C routine that accepts a char * and calls free. Then I call call free using my own code shared library code without needing to import libc or anything else.
  • Valentin Safonnikov
    Valentin Safonnikov about 2 years
    This code produces memory leaks. Use python function create_string_buffer and fill the buffer in c code.