What advantages do LL parsers have over LR parsers?

parsing parser-generator lalr ll lr

14,324

Solution 1

GLR is great if you want a parse tree/forest and don't mind black boxes. It lets you type in whatever CFG you want at the cost of checking for ambiguities at parse time via exhaustive testing, instead of resolving LR/LALR conflicts statically. Some say that's a good trade-off. Ira Baxter's DMS tool or Elkhound, which has a free C++ grammar, are useful for this class of problem. ANTLR is useful for a large class of language applications too, but uses a top-down approach, generating recursive descent parsers called LL(*) that allow semantic predicates. I will state without proof here that predicates allow you to parse context-sensitive languages beyond CFGs. Programmers like to insert actions into grammars, like good error handling, and like to single-step debug. LL is good at all three. LL is what we do by hand so it's easier to understand. Don't believe the wikipedia nonsense about LR being better at handling errors. That said, if you backtrack a lot with ANTLR, errors are indeed worse with LL(*) (PEGs have this problem).

Re backtracking. GLR speculates (i.e. backtracks) too, just like PEGs, ANTLR, and any other non-deterministic strategy. At any non-deterministic LR state, GLR "forks" sub-parsers to try out any viable path. Anyway, LL has good context for error handling. Where LR knows it's matching an expression, LL knows it's an expression in an assignment or IF-conditional; LR knows it could be in either but isn't sure - and that uncertainty is where it gets its power.

GLR is O(n^3) worst case. packrat/PEG is O(n) worst case. ANTLR's are O(n^2) due to cyclic lookahead DFA but O(n) in practice. Doesn't matter really. GLR is fast enough.

ANTLR is ANother Tool for Lang Recognition not anti-LR, but I like that one too ;)

Frankly, like a lot of young coders in 80s, I didn't understand LALR and didn't like black boxes (now I dig the beauty of the GLR engine but still prefer LL). I built a commercial LL(k) based compiler and decided to build a tool to generate what I had built by hand. ANTLR isn't for everyone and edge cases like C++ might be better handled with GLR but a lot of people find ANTLR fits into their comfort zone. Since Jan 2008, there have been 134,000 downloads of ANTLR's binary jar, within ANTLRWorks, and source zips total (according to Google Analytics). See our paper on LL(*) with lots of empirical data.

Solution 2

If you have to hand code one, recursive descent (LL) is something you can do realistically; people cannot hand-build L(AL)R parsers practically by hand.

Given that modern parser generators will handle all the parser construction for you, and that space is not much of an issue, I prefer LR parsers because you don't have to fight with the grammars as much to make them valid for your particular parser generator (no "remove all the left recursion" silliness).

In fact, I prefer GLR parsers, which will pretty much parse anything with a context free grammar. No left-recursion worries. No shift/reduce conflict worries. No lookahead limits.

If you want to see the range of languages that one GLR parsing engine can handle (including the famously hard-to-parse-using-LL/LALR language, C++), you can look here.

Solution 3

From my personal experience (I used both for various situations), the most practical difference is that, with a LL(k), you can define the grammar in an easier way (since it's top-down) without caring about many possible reduce-reduce or shift-reduce conflicts which often occur with LR parsers. The only thing you have to care about is left-recursion which must be transformed into right one.

Another thing is that the top-down approach usually implies a higher complexity (regarding either space or time), because it has to store the whole tree while parsing and it can grow a lot until ambiguities are solved.

Solution 4

The only advantage I've ever been familiar with is that you can easily code LL parsers by hand. LR parsers are MUCH harder to code by hand (you usually use a parser generator).

Solution 5

LL Parsings worst case complexity is O(n^4), whereas LR parsing's worst case complexity is better, O(n^3).

(But no one would ever write O(n^4) grammar.)

https://en.wikipedia.org/wiki/Top-down_parsing

View more solutions

14,324

Author by

Adam Paynter

I was born in Prince Edward Island, Canada and have been experimenting with programming since grade five. In 2007, I graduated with first class honors from the Bachelor of Computer Science program at the University of New Brunswick, Canada. I currently live in Plano, Texas with my wife and two daughters. I work as a software developer for Capital One. I still pursue a handful of personal projects during lunch hours and weekends.

Updated on August 06, 2022

Comments

Adam Paynter almost 2 years

What advantages do LL parsers have over LR parsers to warrant their relative popularity in today's parser generator tools?

According to Wikipedia, LR parsing appears to have advantages over LL:

LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible. This is in contrast to an LL(k) (or even worse, an LL(*) parser) which may defer error detection to a different branch of the grammar due to backtracking, often making errors harder to localize across disjunctions with long common prefixes.

Note: This is not homework. I was just surprised when I found out that Antlr is an LL parser generator (despite having "LR" in its name!).
Ira Baxter over 13 years

Cough C++ does NOT need arbitrary backtracking. A GLR parser does this just fine. See my answer.
Jack over 13 years

you don't have to fight versus grammars just because you are able to define predences directly by specific commands inside the parser definition, not because the LR approach has less requirements over the grammar itself..
Adam Paynter over 13 years

Just so I understand... You're saying the most practical difference is that LL parsers avoid the conflicts typically found in LR parsers. Would this only apply to LR parsers that limit their lookahead to a small number (for example, LR(1) or LALR(1))? I thought that LR(k) parsers didn't have near as many conflicts...
zwol over 13 years

I mean, C++ isn't LL(k) or LR(k) for any finite k, because of the declaration/expression ambiguity.
Adam Paynter over 13 years

@Ira: Very interesting post! I hadn't even heard of GLR parsers before.
Ira Baxter over 13 years

Well, parsing C++ in a pure sense (not the GCC-resolves-types-during-parsing hack) requires the parser also pick up ambiguous parses. LL/LR parses simply don't do that. GLR parsers do. You resolve the ambiguity later with symbol table information, allowing you to separate parsing from name and type resolution, which gives you much cleaner parsers.
Adam Paynter over 13 years

@Jack: I agree with your point about how precedence directives are not a LR-specific advantage. However, would you agree with Ira's point regarding the elimination of left recursion?
Adam Paynter over 13 years

@Zack and @Ira: What an interesting conversation!
Jack over 13 years

Yes, I agree since LL parser don't allow it. But left recursion can be solved in a very straightforward way while some reduce-reduce or shift-reduce conflicts require lot of thinking and refactoring of the grammar itself (which becomes hard when the grammar grows).
Adam Paynter over 13 years

@Jack: Fair enough. Correct me if I'm wrong, but don't the shift/reduce and reduce/reduce conflicts effectively disappear if the LR parser allows for arbitrary numbers of lookahead symbols (that is, an LR(k) parser)? Granted, most LR parser generators limit themselves to 1 lookahead symbol, so I don't have much experience along those lines.
Jack over 13 years

Actually LALR does exist just because taking care about look-ahead in LR parsers is really expensive. Think about the fact that every single look-ahead increases complexity by an order of magnitude IIRC. That's why in practical situations you won't use a LR(3) parser..
Ira Baxter over 13 years

LALR exists because you get smaller tables than LR for many practical grammars. You can certainly define the concept of LALR(3) grammars although nobody does in practice. The reason for GLR is that you can stop thinking about K.
Adam Paynter over 13 years

@Jack: In general, that's true. However, I'm surprised that more LR parser generators don't support LR(k), considering solutions to the exponential space requirements appear to have been available since 1991.
Ira Baxter over 13 years

@Adam: We all learned something today. Our GLR parser uses essentially an LALR(1) parser generator as a foundation and the 1-unit lookahead sets actually help to avoid GLR parse-splits in practice, keeping its overhead down. (They are bit slower than LALR because there's extra bookeeping [which you can eliminate a lot of if you try hard]. The LALR(k) paper might let us peek ahead occasionally a bit further and avoid splits; into my think-about-this bin. Thanks for the pointer.
Adam Paynter over 13 years

@Ira: Very good point. Is there freely available documentation on GLR parsers that I could read?
Ira Baxter over 13 years

@Adam: The references at the Wikipedia sites are must-reads but they are hard to find "free"; I have hardcopies acquired many years ago. Adrian Johnstone seems to be doing a lot of work on advanced GLR parsing, for example: portal.acm.org/citation.cfm?id=1149674. This is free if you are an ACM member. His website cs.rhul.ac.uk/research/languages/publications/… likely has freely accessible documents.
Adam Paynter over 13 years

@Ira: Thanks for the tip! I'm also glad that the link may be of use to you! I would have guessed that it would have been better known in the compiler community after almost 20 years (I only noticed it because the Eclipse foundation uses it in their Java parser).
zwol over 13 years

I am not much on the theory, but I've worked on both the GCC and the EDG front ends for C++, and I was under the impression that it was not practical to avoid what you describe as the "resolves types during parsing hack", period. Of course, both of those are hand coded recursive descent parsers with backtracking, so maybe it's just a consequence of the implementation technique. Has anyone coded a full GLR parser for C++?
Ira Baxter over 13 years

@Zack: Read my answer. Check out the link. YES. We avoid that hack completely. We use it to carry out modifications of large-scale C++ systems.
Adam Paynter over 13 years

@Ira: You may wish to post your sources on the GLR parsing algorithm resources question.
Adam Paynter over 13 years

I can certainly appreciate the ability to debug line-by-line! I can also appreciate the idea of "context" (that is, knowing that the expression being parsed is contained within an if statement). Thanks for the answer, it certainly illustrates some strengths of LL parsers!
Ira Baxter over 13 years

@Terence: what's the bit about a "black box"? All parser generators and the support machinery are pretty much mysteries to the users and rightly so; most user don't want to know the theory or the gears. I buy your "absent proof" of ANTLR doing beyond CFGs; but that's true of any grammar-driven engine that also allow semantic predicates. As a practical matter, they all do (at least all the ones I met or built :) including DMS's. Single-step is a matter of tooling, not the parser generator technology. Quality of error recovery has the same property.
Ira Baxter over 13 years

All: I agree that ANTLR and most parser generators in fact are good for building parsers for "modest" grammars. The distinction starts to occur when the grammars get "tough". If you are in control of the grammar and can change it to make tough cases go away, then any parser generator will do. If you are not, then the strength of the parser generator really matters. In either case, the engineering in support of the tool really helps. In spite of my bias towards GLR (they are O(n) in practice, too!), I'll be the first to admit that Terence has done a pretty decent job making ANTLR effective.
Adam Paynter over 13 years

@Ira: I completely agree with your points. I just want to say that the reason why I accepted Terence's answer was because I thought it best answered the original question ("What advantages do LL parsers have over LR parsers?"). I also agree with you (and Terence) that GLR looks fantastic! I also agree that he has done an excellent job with ANTLR. :)
Terence Parr over 13 years

@Ira: I just meant that GLR is a black box because it works but is opaque. The other extreme is a recursive descent parser (as generated by ANTLR) that is not a black box because it is what a programmer would build by hand and can single step debug. Actually, that is not totally true. LL(*) allows cyclic DFA to scan ahead and I generate state machines (black boxes) for those decisions but that is just for the alternative production prediction not the parsing.
Terence Parr over 13 years

re: error handling. imagine an inconsistent LR state where you must speculate down multiple possible paths that all lead to invalid parses. You have an error but you don't know from path which to derive the error message. You also have ambiguous context. Are you in an expression for an array index, say, or an expression in an assignment statement? Hard to know how to recover. I'm not sure I see how error handling can be just a matter of the tool. LR-based tools seem a bit handicapped compared to LL-based tools in this regard.
Ira Baxter over 13 years

The usual trick is to try to build a tree from the tokens following the point of error, and then look for one token insertion/deletion that will allow the parser to bridge the gap. (We just use the next token because we haven't been highly active in pushing the error recovery). This works pretty well even for an inconsistent state. Most states aren't inconsistent so there's only one sensible continuation. Why doesn't an LL parser have the same problem, if there are multiple possible next parses? The local langugae ambiguity doesn't go away just because you switched parsing technologies.
Terence Parr over 13 years

Well, LL does single token insertion and deletion pretty easily; I'm sure the concept is similar in LR, at least, mid production. The primary issue is when you need to recover based upon which rule invoked the current rule (e.g., what's in the FOLLOW set?). In LL, you know precisely which one that is (at the cost of a weaker parsing strategy of course). LR could have multiple contexts. You don't know whether to resynchronize by consuming until ] or ; for example when trying to clean up a misformed expression. If you are called from the array index you want to consume until ] and so on.
Ira Baxter over 13 years

With GLR its relatively easy. For all live parses at the point where you hit a syntax error, you propose the insertion of any token whose shift on any parse leads to a state in which the following token is legal, and you propose deleting the current token. It proposes 10-20 cures, most of which die in the usual GLR way. Our parser seems to recovers from missing/extra tokens pretty perfectly. When its lost, it is horribly lost, but it usually recovers after awhile. I used to build metacompilers (synthesize recursive descent parsers). Pretty much when they got lost, they never recovered.
Matt Fenwick over 10 years

I really hope this doesn't come off as hostile -- it's an honest question -- when is left recursion useful/necessary in practice? I don't mean to imply that it's not useful/necessary, just that I have never seen any uses of left recursion that couldn't be replaced by a * or + quantifier. I'm sure there are valid uses, but I'm not familiar with any of them.
Ira Baxter over 10 years

* or + are not standard BNF rules. They are standard in many extended BNF rules. If your parser generator doesn't have these, then you need left or right recursion. If you had them, they correspond to left or right recursion. If you generate a parser from the grammar, often EBNF is essentialy reduced to BNF (certainly for LR-class parsers) and you want left recursion because it makes the parser more efficient; with right recursion, for such parsers you have to have a stack as deep as the list length. ...
Ira Baxter over 10 years

Parsers that are implemented top-down can convert kleene operators from logically right recursion into looping operations. But people can still write rules that involve left recursion in complicated ways; I don't have an example off hand but I'm a lot happier knowing I just don't have worry about how I write my grammar rules. If you look at the complaints people have about top-down parsers, even those with kleene star notation, you'll almost always find them trying to find a way to get rid of some inconvenient left recursion. Why have the worry at all?
Ira Baxter over 10 years

And when you have such a genarator, producting them is really easy.
Terence Parr over 9 years

An update. Upcoming OOPSLA '14 paper on Adaptive LL(), ALL(), that takes any grammar with one exception: no indirect left-recursion. Handles e : e '*' e | INT ; rules file antlr.org/papers/allstar-techreport.pdf ALL(*) is my final word on parsing. 25 years and I finally cracked it. Took the C11 spec out of the box complete with left-recursion, albeit a slow linear pace for parsing w/o tweaking the grammar. Java grammar parsers 12,920 files, 3.6M lines, size 123M of Java library in < 6secs. See the tool shootout in paper. Needed either log scale or separate graph for GLR tools.