String input to flex lexer

34,924

Solution 1

The following routines are available for setting up input buffers for scanning in-memory strings instead of files (as yy_create_buffer does):

  • YY_BUFFER_STATE yy_scan_string(const char *str): scans a NUL-terminated string`
  • YY_BUFFER_STATE yy_scan_bytes(const char *bytes, int len): scans len bytes (including possibly NULs) starting at location bytes

Note that both of these functions create, return a corresponding YY_BUFFER_STATE handle (which you must delete with yy_delete_buffer() when done with it) so yylex() scan a copy of the string or bytes. This behavior may be desirable since yylex() modifies the contents of the buffer it is scanning).

If you want avoid the copy (and yy_delete_buffer) using:

  • YY_BUFFER_STATE yy_scan_buffer(char *base, yy_size_t size)

sample main:

int main() {
    yy_scan_buffer("a test string");
    yylex();
}

Solution 2

See this section of Flex's manual for information on how to scan in-memory buffers, such as strings.

Solution 3

flex can parse char * using any one of three functions: yy_scan_string(), yy_scan_buffer(), and yy_scan_bytes() (see the documentation). Here's an example of the first:

typedef struct yy_buffer_state * YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_string(char * str);
extern void yy_delete_buffer(YY_BUFFER_STATE buffer);

int main(){
    char string[] = "String to be parsed.";
    YY_BUFFER_STATE buffer = yy_scan_string(string);
    yyparse();
    yy_delete_buffer(buffer);
    return 0;
}

The equivalent statements for yy_scan_buffer() (which requires a doubly null-terminated string):

char string[] = "String to be parsed.\0";
YY_BUFFER_STATE buffer = yy_scan_buffer(string, sizeof(string));

My answer reiterates some of the information provided by @dfa and @jlholland, but neither of their answers' code seemed to be working for me.

Solution 4

Here is what I needed to do :

extern yy_buffer_state;
typedef yy_buffer_state *YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_buffer(char *, size_t);

int main(int argc, char** argv) {

  char tstr[] = "line i want to parse\n\0\0";
  // note yy_scan_buffer is is looking for a double null string
  yy_scan_buffer(tstr, sizeof(tstr));
  yy_parse();
  return 0;
}

you cannot extern the typedef, which make sense when you think about it.

Solution 5

The accepted answer is incorrect. It will cause memory leaks.

Internally, yy_scan_string calls yy_scan_bytes which, in turn, calls yy_scan_buffer.

yy_scan_bytes allocates memory for a COPY of the input buffer.

yy_scan_buffer works directly upon the supplied buffer.

With all three forms, you MUST call yy_delete_buffer to free the flex buffer-state information (YY_BUFFER_STATE).

However, with yy_scan_buffer, you avoid the internal allocation/copy/free of the internal buffer.

The prototype for yy_scan_buffer does NOT take a const char* and you MUST NOT expect the contents to remain unchanged.

If you allocated memory to hold your string, you are responsible for freeing it AFTER you call yy_delete_buffer.

Also, don't forget to have yywrap return 1 (non-zero) when you're parsing JUST this string.

Below is a COMPLETE example.

%%

<<EOF>> return 0;

.   return 1;

%%

int yywrap()
{
    return (1);
}

int main(int argc, const char* const argv[])
{
    FILE* fileHandle = fopen(argv[1], "rb");
    if (fileHandle == NULL) {
        perror("fopen");
        return (EXIT_FAILURE);
    }

    fseek(fileHandle, 0, SEEK_END);
    long fileSize = ftell(fileHandle);
    fseek(fileHandle, 0, SEEK_SET);

    // When using yy_scan_bytes, do not add 2 here ...
    char *string = malloc(fileSize + 2);

    fread(string, fileSize, sizeof(char), fileHandle);

    fclose(fileHandle);

    // Add the two NUL terminators, required by flex.
    // Omit this for yy_scan_bytes(), which allocates, copies and
    // apends these for us.   
    string[fileSize] = '\0';
    string[fileSize + 1] = '\0';

    // Our input file may contain NULs ('\0') so we MUST use
    // yy_scan_buffer() or yy_scan_bytes(). For a normal C (NUL-
    // terminated) string, we are better off using yy_scan_string() and
    // letting flex manage making a copy of it so the original may be a
    // const char (i.e., literal) string.
    YY_BUFFER_STATE buffer = yy_scan_buffer(string, fileSize + 2);

    // This is a flex source file, for yacc/bison call yyparse()
    // here instead ...
    int token;
    do {
        token = yylex(); // MAY modify the contents of the 'string'.
    } while (token != 0);

    // After flex is done, tell it to release the memory it allocated.    
    yy_delete_buffer(buffer);

    // And now we can release our (now dirty) buffer.
    free(string);

    return (EXIT_SUCCESS);
}
Share:
34,924
marcopolobronch
Author by

marcopolobronch

Updated on July 08, 2022

Comments

  • marcopolobronch
    marcopolobronch almost 2 years

    I want to create a read-eval-print loop using flex/bison parser. Trouble is, the flex generated lexer wants input of type FILE* and i would like it to be char*. Is there anyway to do this?

    One suggestion has been to create a pipe, feed it the string and open the file descriptor and send to the lexer. This is fairly simple but it feels convoluted and not very platform independent. Is there a better way?