Infix Calculator Expression Parser

13,986

Solution 1

This will be a long read, but anyway, I will share with you the algorithm I use to parse an infix expression and store it as a binary tree. Not Stack, but binary tree. Parsing that will give the postfix order easily. I don't say this is the best algorithm out there, but this works for my scripting language.

The algorithm:

We have a method which operates on a "current node" of a binary tree and a "current expression". The nodes contain a "data" field and a "type" field.

Stage 1: Simple things, such as "4" go directly into the node, and we specify the type to be as "DATA", ie. use this information as it is.

Stage 2: Now, Let's consider the following expression:

a) 2 + 3

this will be transformed into the following binary tree:

  +
 / \
2   3

So, the operators go into the nodes and the operands go into the leafs. Transofrming the expression a) into the tree is pretty simple: find the operator, put in the "current" node of the tree, specify the type of the node to be operator "PLUS", and what is left of it goes into the tree to the left part of the node, what is right of it goes into the right tree. Nice and simple, using the information from Stage 1 the two leafs will be "DATA" leafs with value 2 and 3.

Stage 3: But for a more complex expression:

b) 2 * 3 + 4

The tree will be:

    +
   / \ 4
  *
 / \ 
2   3

So we need to modify the algorithm above to the following: Find the first operator which has the highest precedence (considering c++ guidelines... precedence of + (plus) and - (minus) is 6, while precedence of * (multiply), / (divide) and % (modulo) is 5) in the expression, divide the expression into two parts (before operand with highest precedence and after operand with highest precedence) and call recursively the method for the two parts, while placing the operator with the highest precedence into the current node. So, we do create a tree wit hdata like:

    +
   / \ 
  /  call method with "4"
call method with "2*3"

and at this stage we fall back to "Stage 2" for the call ("2*3") and "Stage 1" for the call "4".

Stage 4: What if there are paranthesis in the expression? Such as

c) 2 * (3 + 4)

This will give us the tree:

      *
     / \
    2   +
       / \
      3   4

We modify the algorithm to be like:

  • while the current expression is enclosed in a paranthesis remove the paranthesis from it and restart the algorithm. Be careful. (2 + 3 * 4 + 5) is considered to be enclosed in a parnethesis while (2+3)*(4+5) is NOT. So, it's not just the starting and ending characters of the expression, but you effectively need to count the parantheses. (this is a recursive method, don't be afraid of the first step...)
  • now find the first operator with the highest precedence outside the parantheses of the expression. Again, take the left and right sides of the expression and call the method again and again till you end up at "Stage 1" ie. with a single data element.

    Now this is an algorithm for an expression which consists of plain numbers and operators. For more complex information you might need to refine it to suit your needs. If you consider it worth, take a look at https://sourceforge.net/p/nap-script/mercurial/ci/default/tree/compiler/interpreter.cpp . This contains a full implementation (in C) of the algorithm above with regard to more complex notions (variables, method calls, postfix/prefix operators, etc...) The method is build_expr_tree, starts at line 1327.

Solution 2

The method of recursive descent is the simplest way to implement a correct expression parser by hand. Here the programming language stack does the same thing as the explicit stack you're trying to use. There are many RD examples to be found with google, and any good compiler book will have some.

The linked Wikipedia page shows a parser, but not how to add evaluation. So below is a complete rudimentary expression evaluator in C. It could be easily wrapped in a C++ class with the globals becoming instance variables. It's missing features you'd need in a production system. For example, when it finds an error, it just exits. C++ exceptions will easily allow you to unwind the recursion and continue. It also needs protections against numerical overflow, divide-by-zero, etc., which you obviously know how to do.

The idea of recursive descent is to transform the grammar of the desired language into a form called LL(1). When that's done, there are fixed rules - guarenteed to work every time - for transforming the grammar rules into procedures. I've done this below by hand. There are tools to do it automatically.

So this evaluator is very easy to extend. Just add the necessary grammar rule, then implement the needed enhancements to scanner, parser, and evaluation code. For example, a built-in function rule would be unsigned_factor -> FUNCTION_NAME ( expr ), where the scanner recognizes all function names as the same token and the unsigned_factor C function is augmented to parse and compute values.

I had to include a small scanner to get a working program. Note more than half the code is the scanner. Basic RD parsers are simple.

They get more complex if you add error recovery: the intelligent ability to skip just past an error and continue parsing, while emitting only one precisely worded error message. But then again, this adds lots of complexity to any parser.

// Bare bones scanner and parser for the following LL(1) grammar:
// expr -> term { [+-] term }     ; An expression is terms separated by add ops.
// term -> factor { [*/] factor } ; A term is factors separated by mul ops.
// factor -> unsigned_factor      ; A signed factor is a factor, 
//         | - unsigned_factor    ;   possibly with leading minus sign
// unsigned_factor -> ( expr )    ; An unsigned factor is a parenthesized expression 
//         | NUMBER               ;   or a number
//
// The parser returns the floating point value of the expression.

#include <stdio.h>
#include <stdlib.h>

// The token buffer. We never check for overflow! Do so in production code.
char buf[1024];
int n = 0;

// The current character.
int ch;

// The look-ahead token.  This is the 1 in LL(1).
enum { ADD_OP, MUL_OP, LEFT_PAREN, RIGHT_PAREN, NUMBER, END_INPUT } look_ahead;

// Forward declarations.
void init(void);
void advance(void);
double expr(void);
void error(char *msg);

// Parse expressions, one per line. 
int main(void)
{
  init();
  while (1) {
    double val = expr();
    printf("val: %f\n", val);
    if (look_ahead != END_INPUT) error("junk after expression");
    advance();  // past end of input mark
  }
  return 0;
}    

// Just die on any error.
void error(char *msg)
{
  fprintf(stderr, "Error: %s. I quit.\n", msg);
  exit(1);
}

// Buffer the current character and read a new one.
void read()
{
  buf[n++] = ch;
  buf[n] = '\0';  // Terminate the string.
  ch = getchar();
}

// Ignore the current character.
void ignore()
{
  ch = getchar();
}

// Reset the token buffer.
void reset()
{
  n = 0;
  buf[0] = '\0';
}

// The scanner.  A tiny deterministic finite automaton.
int scan()
{
  reset();
START:
  switch (ch) {
    case ' ': case '\t': case '\r':
      ignore();
      goto START;

    case '-': case '+':
      read();
      return ADD_OP;

    case '*': case '/':
      read();
      return MUL_OP;

    case '(':
      read();
      return LEFT_PAREN;

    case ')':
      read();
      return RIGHT_PAREN;

    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
      read();
      goto IN_LEADING_DIGITS;

    case '\n':
      ch = ' ';    // delayed ignore()
      return END_INPUT;

    default:
      error("bad character");
  }

IN_LEADING_DIGITS:
  switch (ch) {
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
      read();
      goto IN_LEADING_DIGITS;

    case '.':
      read();
      goto IN_TRAILING_DIGITS;

    default:
      return NUMBER;
  }

IN_TRAILING_DIGITS:
  switch (ch) {
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
      read();
      goto IN_TRAILING_DIGITS;

    default:
      return NUMBER;
  }          
}

// To advance is just to replace the look-ahead.
void advance()
{
  look_ahead = scan();
}

// Clear the token buffer and read the first look-ahead.
void init()
{
  reset();
  ignore(); // junk current character
  advance();
}

double unsigned_factor()
{
  double rtn = 0;
  switch (look_ahead) {
    case NUMBER:
      sscanf(buf, "%lf", &rtn);
      advance();
      break;

    case LEFT_PAREN:
      advance();
      rtn = expr();
      if (look_ahead != RIGHT_PAREN) error("missing ')'");
      advance();
      break;

    default:
      error("unexpected token");
  }
  return rtn;
}

double factor()
{
  double rtn = 0;
  // If there is a leading minus...
  if (look_ahead == ADD_OP && buf[0] == '-') {
    advance();
    rtn = -unsigned_factor();
  }
  else
    rtn = unsigned_factor();
  return rtn;
}

double term()
{
  double rtn = factor();
  while (look_ahead == MUL_OP) {
    switch(buf[0]) {
      case '*':
        advance(); 
        rtn *= factor(); 
        break; 

      case '/':
        advance();
        rtn /= factor();
        break;
    }
  }
  return rtn;
}

double expr()
{
  double rtn = term();
  while (look_ahead == ADD_OP) {
    switch(buf[0]) {
      case '+': 
        advance();
        rtn += term(); 
        break; 

      case '-':
        advance();
        rtn -= term();
        break;
    }
  }
  return rtn;
}

And running the program:

1 + 2 * 3
val: 7.000000
(1 + 2) * 3
val: 9.000000
Share:
13,986
Painguy
Author by

Painguy

Updated on August 25, 2022

Comments

  • Painguy
    Painguy over 1 year

    How do I parse and evaluate expressions in an infix calculator grammar? I thought of two ways.

    The 1st involves using two stacks. One is for numbers and the other is for operators, and I would assess the operator precedence and association in order to figure out how to evaluate an expression.

    The second method involves converting the infix expression to postfix which I have no idea how I'd go about doing. It was just an idea. Currently I set up my program with the intention to use the 1st method.

    #include <iostream>
    #include <string>
    #include <sstream>
    using namespace std;
    
    bool die(const string &msg);
    
    //stack class
    class Stack{
    public:
        Stack();
        void push(const double &val);
        void push(const string &oper);
        double popnum();
        string popop();
        double getopele();
        double getnumele();
    private:
        static const unsigned MAX=30;
        string opstack[MAX];
        double numstack[MAX];
        unsigned opele;
        unsigned numele;
    };
    
    //operator type
    struct OP{
        string name;
        void * func;
        unsigned arity;
        unsigned prec;
        bool lass;
        string descrip;
    };
    //operator table
    OP op[]={{"+", add, 2, 4, true, "2+3 is 5"},
            {"-", subtract, 2, 4, true, "2-3 is -1"},
            {"*", multiply, 2, 6, true, "2*3 is 6"},
            {"/", divide, 2, 6, true, "2/3 is 0.666666..., div by 0 illegal"}};
    unsigned OPELE =sizeof(op)/sizeof(op[0]);
    
    //operators
    bool add(double &r, double &x, double &y);
    bool subtract(double &r, double &x, double &y);
    bool multiply(double &r, double &x, double &y);
    bool divide(double &r, double &x, double &y);
    
    //Manip
    unsigned findindex(string token, OP op[], unsigned OPELE);
    bool parse(double &t, const string &token);
    bool evaluate(double &result, string line);
    bool weird(double x);
    
    int main(){
        for(string line; getline(cin, line);){
            if(line=="QUIT") break;
            if(line.empty()) continue;
            if(line=="DOC")
                for(unsigned i=0; i<OPELE; i++)
                    cout<<op[i].name<<" | "<<op[i].descrip<<'\n';
            double result;
            if(evaluate(result, line)){
                cout<<result<<'\n';
            }else{
                cout<<"Could not understand input\n\n";
            }
        }
    }
    
    Stack::Stack(){
        opele=0;
        numele=0;
    }
    
    void Stack::push(const double &val){
        if(MAX) die("Stack Overflow");
        numstack[numele++]=val;
    }
    
    void Stack::push(const string &oper){
        if(MAX) die("Stack Overflow");
        opstack[opele++]=oper;
    }
    
    double Stack::popnum(){
        if(!numele) die("Stack Underflow");
        return numstack[--numele];
    }
    
    string Stack::popop(){
        if(!opele) die("Stack Underflow");
        return opstack[--opele];
    }
    
    double Stack::getopele(){
        return opele;
    }
    
    double Stack::getnumele(){
        return numele;
    }
    
    bool add(double &r, double &x, double &y){
        double t = x + y;
        if( weird(t) )  return false;
        r = t;
        return true;
    }
    
    bool subtract(double &r, double &x, double &y){
        double t = x - y;
        if( weird(t) )  return false;
        result = t;
        return true;
    }
    
    bool multiply( double & r, double& x, double &y ){
        double t = x * y;
        if( weird(t) )  return false;
        result = t;
        return true;
    }
    
    bool divide( double & result, double &x, double &y ){
        double t = x / y;
        if( weird(t) )  return false;
        result = t;
        return true;
    }
    
    unsigned findindex(string token, OP op[], unsigned OPELE){
        for(unsigned i=0l i<OPELE; i++)
            if(op[i].name==token)
                return i;
        return UINT_MAX;
    
    }
    
    bool parse(double &t, const string &token){
        istringstream sin( token );
        double t;
        if( !(sin >>t) )  return false;
        char junk;
        if( sin >>junk )  return false;
        value = t;
        return true;
    }
    
    bool evaluate(double &result, string line){
        istringstream sin(line);
        Stack s;
        for(string token; sin>>token;){
            double t;
            if(parse(t, token)){
                s.push(t);
            }else if(
        }
    }
    
    bool weird( double x ){
        return  x != x || x != 0 && x == 2*x;
    }