Writing `eval()` in C

12,888

Solution 1

Trying to parse C is a real pain in the behind; but we already know how to parse C; invoke the C compiler! Here we compile the eval code into a dynamic library and load it.

You may run into problems where your dynamic code can't find other functions or variables in your own code; the easy solution for that is to compile your whole program except for main() as a library and link the dynamic code library against it. You can avoid the -fpic penalty by setting your library load address only a few K above your main load address. On Linux, unresolved symbols in a library can be resolved by the executable if not stripped, and glibc depends on this functionality; however, compiler optimizations get in the way sometimes so the total library method may be necessary.

Sample code below is for Linux. This can be adopted to other Unix including Mac OSX with minor work. Attempting on Windows is possible, but harder as you have no guarantee of a C compiler unless you're willing to ship one; and on Windows there's the obnoxious rule about multiple C runtimes so you must build with the same one you ship, and therefore must also build with the same compiler you ship. Also, you must use the total library technique here or symbols in your main program just won't resolve in the library (PE file format can't express the necessary).

This sample code provides no way for the eval() code to save state; if you need this you should do so either by variables in the main program or (preferred) passing in a state structure by address.

If you are trying to do this in an embedded environment, don't. This is a bad idea in the embedded world.

In answer to rici's comment; I have never seen a case where the argument types and return type of an eval() block were not statically determined from the surrounding code; besides else how would you be able to call it? Example code below could be cut up extracting the shared part so the per-type part is only a couple of lines; exercise is left for the reader.

If you don't have a specific reason to want dynamic C; try embedded LUA instead with a well-defined interface.

/* gcc -o dload dload.c -ldl */

#include <dlfcn.h>
#include <stdio.h>

typedef void (*fevalvd)(int arg);

/* We need one of these per function signature */
/* Disclaimer: does not support currying; attempting to return functions -> undefined behavior */
/* The function to be called must be named fctn or this does not work. */
void evalvd(const char *function, int arg)
{
        char buf1[50];
        char buf2[50];
        char buf3[100];
        void *ctr;
        fevalvd fc;
        snprintf(buf1, 50, "/tmp/dl%d.c", getpid());
        snprintf(buf2, 50, "/tmp/libdl%d.so", getpid());
        FILE *f = fopen(buf1, "w");
        if (!f) { fprintf (stderr, "can't open temp file\n"); }
        fprintf(f, "%s", function);
        fclose(f);
        snprintf(buf3, 100, "gcc -shared -fpic -o %s %s", buf2, buf1);
        if (system(buf3)) { unlink(buf1); return ; /* oops */ }

        ctr = dlopen(buf2, RTLD_NOW | RTLD_LOCAL);
        if (!ctr) { fprintf(stderr, "can't open\n"); unlink(buf1); unlink(buf2); return ; }
        fc = (fevalvd)dlsym(ctr, "fctn");
        if (fc) {
                fc(arg);
        } else {
                fprintf(stderr, "Can't find fctn in dynamic code\n");
        }
        dlclose(ctr);
        unlink(buf2);
        unlink(buf1);
}

int main(int argc, char **argv)
{
        evalvd("#include <stdio.h>\nvoid fctn(int a) { printf(\"%d\\n\", a); }\n", 10);
}

Solution 2

It's possible, but a pain to do. You need to write parser that takes the text as input and generates a syntax tree; then you need to simplify constructs (eg. converting loops into goto statements and simplify expressions into single-static assignments that have only 1 operation). Then you need to match all of the patterns in your syntax tree with sequences of instructions on the target machine that perform the same tasks. Finally, you need to select the registers to use for each of those instructions, spilling them onto the stack if necessary.

In short, writing an implementation for eval in C is possible, but a huge amount of work that requires a lot of expertise and knowledge in several fields of computer science. The complexity of writing a compiler is the precise reason why most programming languages are either interpreted or use a virtual machine with a custom bytecode. Tools like clang and llvm make this a lot easier, but those are written in C++, not C.

Solution 3

A couple of weeks back, I wanted to do something similar and this is the first question that I stumbled upon, hence answering here, now that I have some hold on this :) I am surprised nobody mentioned tcc (specifically libtcc) which lets you compile code from a string and invoke the function thus defined. For e.g.:

int (*sqr)(int) = NULL;
TCCState *S = tcc_new();

tcc_set_output_type(S, TCC_OUTPUT_MEMORY);
tcc_compile_string(S, "int squarer(int x) { return x*x; }");
tcc_relocate(S, TCC_RELOCATE_AUTO);
sqr = tcc_get_symbol(S, "func");

printf("%d", sqr(2));
tcc_delete(S);

(Error handling omitted for brevity). Beyond this basic example, if one wants to use the variables of the host program within the dynamic function, a little more work is needed. If I had a variable int N; and I wanted to use it, I would need 2 things: In the code string:

 ... "extern int N;"

Tell tcc:

tcc_add_symbol(S, "N", &N);

Similarly, there are APIs to inject Macros, open entire libraries etc. HTH.

Share:
12,888
csTroubled
Author by

csTroubled

Updated on June 22, 2022

Comments

  • csTroubled
    csTroubled about 2 years

    I've been trying to make an eval function in C for a while.

    At the moment, my idea is to make a hash String -> function pointer with all the standard library C functions, and all the functions that I make, that way I could handle function invocations (on already defined functions).

    However, defining functions with strings (i.e, calling eval("int fun(){return 1;}")) is still a problem, I don't know how I could handle this on runtime, does anyone have any idea?

    Variable definitions don't seem too much of a problem, as I could just use another hash var_name -> pointer and use that pointer whenever the variable is required.

    By the way, I don't care about performance, I want to get this to work.

  • Jonathan Leffler
    Jonathan Leffler almost 8 years
    There's still the issue of calling the function. You might need to look at the Foreign Function Interface library (libffi or on Github libffi). Or there might be another way to do that work.
  • DeftlyHacked
    DeftlyHacked almost 8 years
    @JonathanLeffler I considered mentioning libffi, but I figured he could probably read between the lines that I was saying that what he's trying to do is just completely impractical. He would be far better off using Lua or Python like everyone else.
  • YoTengoUnLCD
    YoTengoUnLCD almost 8 years
    As noted in the comments, this does seem like a roundabout way, and requires many more permissions (creating files, running gcc, etc) on the system than a real eval (dynamically parsing the code, only requiring (potentially a lot of) memory).
  • YoTengoUnLCD
    YoTengoUnLCD almost 8 years
    I just tested your example, your eval example calls the function fctn with the value 10, but nowhere in the string there's an invocation, this is surely not the intended behaviour.
  • vpalmu
    vpalmu almost 8 years
    The 10 is the second argument to the evalvd function; it is indeed the intended behavior.
  • YoTengoUnLCD
    YoTengoUnLCD almost 8 years
    Let me rephrase myself, it is the indented behavior by you, not in any regular eval function, why are you invoking a function that's just being defined in that code?
  • vpalmu
    vpalmu almost 8 years
    @YoTengoUnLCD: Because in C you cannot have an expression in top-level code; therefore the top-level code must be wrapped in a function by the code-builder implied to exist by the question; and it takes arguments because closures are probably not possible without compiling the surrounding code with a custom compiler. Also note that OP knows this and even his example wraps it in a function.
  • user9869932
    user9869932 about 3 years
    In my case (macOS), #include <stdlib.h> and #include <unistd.h> are required to compile. It works perfect for me, ty
  • Rick
    Rick about 2 years