How to pass the yytext from the lex file to yacc?

14,655

Revised question

The value on the Yacc stack is controlled by YYSTYPE or %union. Use YYSTYPE when the type information is simple; use %union when it is complex.

One of my grammars contains:

struct Token
{
    int      toktype;
    char    *start;
    char    *end;
};
typedef struct Token Token;

#define YYSTYPE Token

For a variety of reasons (not necessarily good ones), my grammar uses a hand-crafted lexical analyzer instead of Lex.

In the grammar rules, you refer to items like NAME in your example as $1 (where the actual number depends on where the token appears in the list of tokens or terminals that make up the rule).

For example (same grammar):

disconnect
    :   K_DISCONNECT K_CURRENT
        { conn->ctype = CONN_CURRENT; }
    |   K_DISCONNECT K_ALL
        { conn->ctype = CONN_ALL; }
    |   K_DISCONNECT K_DEFAULT
        { conn->ctype = CONN_DEFAULT; }
    |   K_DISCONNECT string
        { conn->ctype = CONN_STRING;
          set_connection(conn, $2.start, $2.end);
        }
    ;

And:

load
    :   K_LOAD K_FROM opt_file_pipe string load_opt_list K_INSERT
        {
            set_string("load file", load->file, sizeof(load->file),
                       $4.start, $4.end);
            load->stmt = $6.start;
        }
    ;

I don't know whether seeing the outline of the hand-crafted yylex() helps; in the grammar, it is a function in the same file as yyparse().

static const char *c_token;     /* Where to start next token search */

static int yylex(void)
{
    char        buffer[MAX_LEXTOKENLENGTH];
    const char *start;

    if (c_token == 0)
        abort();

    if (bare_filename_ok)
        start = scan_for_filename(c_token, &c_token);
    else
        start = sqltoken(c_token, &c_token);

    yylval.start = CONST_CAST(char *, start);
    yylval.end = CONST_CAST(char *, c_token);
    if (*start == '\0')
    {
        yylval.toktype = 0;
        return yylval.toktype;
    }
    set_token(buffer, sizeof(buffer), start, c_token);
#ifdef YYDEBUG
    if (YYDEBUGVAR > 1)
        printf("yylex(): token = %s\n", buffer);
#endif /* YYDEBUG */

    /* printf("yylex(): token = %s\n", buffer); */
    if (isalpha((unsigned char)buffer[0]) || buffer[0] == '_')
    {
        Keyword  kw;
        Keyword *p;
        kw.keyword = buffer;
        p = (Keyword *)bsearch(&kw, keylist, DIM(keylist), sizeof(Keyword),
                                kw_compare);    /*=C++=*/
        if (p == 0)
            yylval.toktype = S_IDENTIFIER;
        else
            yylval.toktype = p->token;
    }
    else if (buffer[0] == '\'')
    {
        yylval.toktype = S_SQSTRING;
    }
    else if (buffer[0] == '"')
    {
        yylval.toktype = S_DQSTRING;
    }
    else if (isdigit((unsigned char)buffer[0]))
    {
        yylval.toktype = S_NUMBER;
    }
    else if (buffer[0] == '.' && isdigit((unsigned char)buffer[1]))
    {
        yylval.toktype = S_NUMBER;
    }

...various single-character symbols recognized...

    else if (buffer[0] == ':')
    {
        assert(buffer[1] == '\0');
        yylval.toktype = C_COLON;
    }
    else
    {
        yylval.toktype = S_ERROR;
    }
    return yylval.toktype;
}

Original question

The variable is normally a global variable - your Yacc code uses one of two possible declarations:

extern char *yytext;    /* Correct for Flex */
extern char yytext[];   /* Correct for traditional Lex */

Which of those is correct depends on how your version of Lex defines it.

If you want to add a length (perhaps yytextlen), then you can define such a variable and have every return from yylex() ensure that yytextlen is set. Alternatively, you can arrange for your grammar to call wwlex(), and your wwlex() simply does:

int wwlex(void)
{
    int rc = yylex();
    yytextlen = strlen(yytext);
    return rc;
}

Or you can arrange for Lex to generate code with the rename, and have Yacc continue to call yylex() and you provide the code above as yylex() and have it call the renamed Lex function. Either way works.

Share:
14,655
CompilingCyborg
Author by

CompilingCyborg

Compiling a Cyborg! ;D

Updated on June 11, 2022

Comments

  • CompilingCyborg
    CompilingCyborg almost 2 years

    Please i am facing a simple problem.. here is the issue, In my lex file i have something similiar to:

    char *ptr_String;
    
    "name = "  { BEGIN sName; }
    
    <sName>.+   {
              ptr_String = (char *)calloc(strlen(yytext)+1, sizeof(char));
                  strcpy(ptr_String, yytext);
                  yylval.sValue = ptr_String;
                  return NAME;
        }
    

    Now in my Yacc file i have something similar to:

    stmt_Name:
        NAME
        {
            /*Now here i need to get the matched string of <sName>.+ and measure it's length. */
            /*The aim is simply outputing the name to the screen and storing the length in a global variable.
        }
        ;
    

    Please any suggestions? Thanks so much for all your time and help.

  • CompilingCyborg
    CompilingCyborg over 13 years
    thanks for your reply! please can you take a look at my edited version?
  • VGE
    VGE over 13 years
    You can avoid use of strlen(yytext) because lex and flex define yyleng which is the length of yytext string.
  • Jonathan Leffler
    Jonathan Leffler over 13 years
    @VGE: oh, thank you. I'd forgotten that detail. In that case, of course, the chicanery with the function names isn't necessary.