Creating a parser for a simple pseudocode language?

16,121

Solution 1

Although I think that it's great that you want to build a parser for a language like this, doing so is much harder than it looks. Parsing is a very well-studied problem and there are many excellent algorithms that you can use, but they are extremely difficult to implement by hand. While you can use tricks like conversions to RPN for smaller examples like parsing expressions, building up a full programming language requires a much more complex set of tricks.

To parse a language of this complexity, you are probably best off using a parser generator rather than trying to write your own by hand. ANTLR and Java CUP are two well-known tools for doing precisely what you're interested in accomplishing, and I would strongly suggest using one of the two of them.

Hope this helps!

Solution 2

For simple languages (this is a judgement call, and if you are inexperienced you may not be able to make that call correctly), one can often write a recursive descent parser by hand that does well enough. The good news is that coding a recursive descent parser is pretty straightforward.

If you aren't sure, use overkill in the form of the strongest parser generator you can get.

Solution 3

in simple cases writing manually a parser makes sense.

However, using StringTokenizer is a indicator of doing it wrong, because a StringTokenizer IS already a SIMPLE parser.

a parser usually reads a char and changes its state depending on the value of that char.

Just a simple parser, a "b" make following char "uppercase", e to lowercase. "." stops

 String input = "aDDbcDDeaaef.";

 int pos = 0;

 int state = 0;  
 while (pos < input.length()) {
    char z = input.charAt (pos);
    if (z == '.') break;
    switch (z) {
    case 'b': state = 1; break;
    case 'e': state = 0; break;
    default:
      if (state == 0) {
        System.out.print(Char.toLowerCase(z));
      } else {
        System.out.print(Char.toUpperCase(z));
      }
    }
    pos ++;
 }
Share:
16,121
Vinayak Garg
Author by

Vinayak Garg

I love programming in C++. Also know Java, Python, PHP, Javascript. Currently working on Android. In my free time I am working on Unity 5 game, requiring me to learn Blender, C# and Photoshop. "Luck is what happens when preparation meets opportunity." ~ Lucius Annaeus Seneca Hablo un poco de español

Updated on June 13, 2022

Comments

  • Vinayak Garg
    Vinayak Garg almost 2 years

    I wanted to make a simple parser, for a "pseudo code" like language(kept rigid), in Java. A sample pseudo code would be -

    //This is a comment
    $x1 = readint
    $x2 = readint
    
    $dx = $x2 - $x1
    #f = $dx / 2
    
    if ($dx > 0)
    {
      loop while(#f > 1)
      {
         print(#f)
         #f = #f / 2
      }
    }
    

    Note that above code is rigid in that, there can not be more than one statement on a line, integers start with $, floats start with # etc.

    To parse such code, first I can use StringTokenizer, and then regular expression, to match integer-variables, float-variables, or Keywords.

    Is this approach good? For statements in loop, how can i store expressions, so that i don't have to tokenize in each iteration?

    I could think of converting expressions (like #f = #f / 2) to polish notation, and then to store in stack. And in each iteration, while popping operands I could replace value for each variable. But is this efficient enough?

    Thanks in advance, for any suggestion.

  • Ira Baxter
    Ira Baxter about 12 years
    Having implemented some of these "extremely difficult to implement by hand" algorithms (e.g., LL, L(AL)R and GLR, I'd say the effort to implement them and use them for single project IMHO is often less than the effort to build a full parser using weaker methods. Agreed that you should get an off-the-shelf tool where you can, simply because it less work, not because it is extremely hard.
  • Vinayak Garg
    Vinayak Garg about 12 years
    Thank you! I will go and look at both of them, but can you pick one of them? :P Considering my sample code, and that I would release my code under GPL, smaller footprint etc. Hehe, I would definitely make a parser, but probably some other day.
  • templatetypedef
    templatetypedef about 12 years
    @VinayakGarg- I don't have much experience with either, but ANTLR has a broad community base. It might be worth exploring.
  • Vinayak Garg
    Vinayak Garg about 12 years
    Your link was very informative, Thanks. And then the metacompiler, and then the website. It all seems great, I would love to do it, in next semester probably. But since currently I am short on time and experience, i will go for overkill!
  • dodecaplex
    dodecaplex about 9 years
    Well you are using the term SIMPLE parser as "a parser for a regular language" and your example shows a very simple regular language parser. In the original question, the language was considered as simple because there was no lexical ambiguity. However, the language itself is strictly context free. Hence there is no way to parse it using a simple finite state automaton as the one you implemented here.
  • loukios
    loukios almost 7 years
    the incrementation of 'pos' is missing before the end of the while loop :)