Making YACC output an AST (token tree)

20,236

Solution 1

Have you looked at the manual (search for "parse tree" to find the spot)? It suggests putting node creation in an action with your left and right descendants being $1 and $3, or whatever they may be. In this case, yacc would be moving up the tree on your behalf rather than your doing it manually.

Solution 2

Expanding on Hao's point and from the manual, you want to do something like the following:

Assuming you have your abstract syntax tree with function node which creates an object in the tree:

expr : expr '+' expr
  {  
     $$ = node( '+', $1, $3 );  
  }

This code translates to "When parsing expression with a plus, take the left and right descendants $1/$3 and use them as arguments to node. Save the output to $$ (the return value) of the expression.

$$ (from the manual):

To return a value, the action normally sets the pseudovariable ``$$'' to some value.

Solution 3

The other answers propose to modify the grammar, this isn't doable when playing with a C++ grammer (several hundred of rules..)

Fortunately, we can do it automatically, by redefining the debug macros. In this code, we are redefining YY_SYMBOL_PRINT actived with YYDEBUG :

%{

typedef struct tree_t {
    struct tree_t **links;
    int nb_links;
    char* type; // the grammar rule
};

#define YYDEBUG 1
//int yydebug = 1;

tree_t *_C_treeRoot;
%}
%union tree_t

%start program

%token IDENTIFIER
%token CONSTANT

%left '+' '-'
%left '*' '/'
%right '^'

%%
progam: exprs { _C_treeRoot = &$1.t; }
    |
    | hack
    ;

exprs:
    expr ';'
    | exprs expr ';'
    ;


number:
    IDENTIFIER
    | '-' IDENTIFIER
    | CONSTANT
    | '-' CONSTANT
    ;

expr:
    number
    | '(' expr ')'
    | expr '+' expr
    | expr '-' expr
    | expr '*' expr
    | expr '/' expr
    | expr '^' expr
    ;

hack:
    {
    // called at each reduction in YYDEBUG mode
    #undef YY_SYMBOL_PRINT
    #define YY_SYMBOL_PRINT(A,B,C,D) \
        do { \
            int n = yyr2[yyn]; \
            int i; \
            yyval.t.nb_links = n; \
            yyval.t.links = malloc(sizeof *yyval.t.links * yyval.t.nb_links);\
            yyval.t.str = NULL; \
            yyval.t.type = yytname[yyr1[yyn]]; \
            for (i = 0; i < n; i++) { \
              yyval.t.links[i] = malloc(sizeof (YYSTYPE)); \
              memcpy(yyval.t.links[i], &yyvsp[(i + 1) - n], sizeof(YYSTYPE)); \
            } \
        } while (0)

    }
    ;
%%

#include "lexer.c"


int yyerror(char *s) {
    printf("ERROR : %s [ligne %d]\n",s, num_ligne);
    return 0;
}


int doParse(char *buffer)
{
    mon_yybuffer = buffer;
    tmp_buffer_ptr = buffer;
    tree_t *_C_treeRoot = NULL;
    num_ligne = 1;
    mon_yyptr = 0;

    int ret = !yyparse();

    /////////****
             here access and print the tree from    _C_treeRoot 
    ***///////////
}


char *tokenStrings[300] = {NULL};
char *charTokenStrings[512];

void initYaccTokenStrings()
{
    int k;
    for (k = 0; k < 256; k++)
    {
        charTokenStrings[2*k] = (char)k;
        charTokenStrings[2*k+1] = 0;
        tokenStrings[k] = &charTokenStrings[2*k];
    }
    tokenStrings[CONSTANT] = "CONSTANT";
    tokenStrings[IDENTIFIER] = "IDENTIFIER";


    extern char space_string[256];

    for (k = 0; k < 256; k++)
    {
        space_string[k] = ' ';
    }
}

the leafs are created just before the RETURN in the FLEX lexer

Share:
20,236
Sprotty
Author by

Sprotty

Updated on November 20, 2020

Comments

  • Sprotty
    Sprotty over 3 years

    Is it possible to make YACC (or I'm my case MPPG) output an Abstract Syntax Tree (AST).

    All the stuff I'm reading suggests its simple to make YACC do this, but I'm struggling to see how you know when to move up a node in the tree as your building it.