What is the difference between LR(0) and SLR parsing?

algorithm parsing compiler-construction lr

69,084

Solution 1

Both LR(0) and SLR(1) parsers are bottom-up, directional, predictive parsers. This means that

The parsers attempt to apply productions in reverse to reduce the input sentence back to the start symbol (bottom-up)
The parsers scan the input from left-to-right (directional)
The parsers attempt to predict what reductions to apply without necessarily seeing all of the input (predictive)

Both LR(0) and SLR(1) are shift/reduce parsers, meaning that they process the tokens of the input stream by placing them on a stack, and at each point either shifting a token by pushing it onto the stack or reducing some sequence of terminals and nonterminals atop the stack back to some nonterminal symbol. It can be shown that any grammar can be parsed bottom-up using a shift/reduce parser, but that parser might not be deterministic. That is, the parser may have to "guess" whether to apply a shift or reduction, and may end up having to backtrack to realize that it made the wrong choice. No matter how powerful a deterministic shift/reduce parser you construct, it will never be able to parse all grammars.

When a deterministic shift/reduce parser is used to parse a grammar that it cannot handle, it results in shift/reduce conflicts or reduce/reduce conflicts, where the parser may enter a state in which it cannot tell what action to take. In a shift/reduce conflict, it cannot tell whether it should add another symbol to the stack or perform some reduction on the top symbols of the stack. In a reduce/reduce conflict, the parser knows that it needs to replace the top symbols of the stack with some nonterminal, but it can't tell what reduction to use.

I apologize if this is a lengthy exposition, but we need this to be able to address the difference between LR(0) and SLR(1) parsing. An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).

Recall that the two possible LR conflicts are shift/reduce and reduce/reduce. In both of these cases, there are at least two actions that the LR(0) automaton could be taking, and it can't tell which of them to use. Since at least one of the conflicting actions is a reduction, a reasonable line of attack would be to try to have the parser be more careful about when it performs a particular reduction. More specifically, let's suppose that the parser is allowed to look at the next token of input to determine whether it should shift or reduce. If we only allow the parser to reduce when it "makes sense" to do so (for some definition of "makes sense"), then we may be able to eliminate the conflict by having the automaton specifically choose to either shift or reduce in a particular step.

In SLR(1) ("Simplified LR(1)"), the parser is allowed to look at one token of lookahead when deciding whether it should shift or reduce. In particular, when the parser wants to try reducing something of the form A → w (for nonterminal A and string w), it looks at the next token of input. If that token could legally appear after the nonterminal A in some derivation, the parser reduces. Otherwise, it does not. The intuition here is that in some cases it makes no sense to attempt a reduction, because given the tokens we've seen so far and the upcoming token, there is no possible way that the reduction could ever be correct.

The only difference between LR(0) and SLR(1) is this extra ability to help decide what action to take when there are conflicts. Because of this, any grammar that can be parsed by an LR(0) parser can be parsed by an SLR(1) parser. However, SLR(1) parsers can parse a larger number of grammars than LR(0).

In practice, though, SLR(1) is still a fairly weak parsing method. More commonly, you will see LALR(1) ("Lookahead LR(1)") parsers being used. They too work by trying to resolve conflicts in an LR(0) parser, but the rules they use for resolving conflicts are far more precise than those used in SLR(1), and consequently a much larger number of grammars are LALR(1) than are SLR(1). To be a bit more specific, SLR(1) parsers try to resolve conflicts by looking at the structure of the grammar to learn more information about when to shift and when to reduce. LALR(1) parsers look at both the grammar and the LR(0) parser to get even more specific information about when to shift and when to reduce. Because LALR(1) can look at the structure of the LR(0) parser, it can more precisely identify when certain conflicts are spurious. The Linux utilities yacc and bison, by default, produce LALR(1) parsers.

Historically, LALR(1) parsers were typically constructed through a different method that relied on the far more powerful LR(1) parser, so you will often see LALR(1) described that way. To understand this, we need to talk about LR(1) parsers. In an LR(0) parser, the parser works by keeping track of where it might be in the middle of a production. Once it has found that it's reached the end of a production, it knows to try to reduce. However, the parser might not be able to tell whether it's in at the end of one production and the middle of another, which leads to a shift/reduce conflict, or which of two different productions it has reached the end of (a reduce/reduce conflict). In LR(0), this immediately leads to a conflict and the parser fails. In SLR(1) or LALR(1), the parser then makes the decision to shift or reduce based on the next token of lookahead.

In an LR(1) parser, the parser keeps track of additional information as it operates. In addition to keeping track of what production the parser believes is being used, it keeps track of what possible tokens might appear after that production is completed. Because the parser keeps track of this information at each step, and not just when it needs to make the decision, the LR(1) parser is substantially more powerful and precise than any of the LR(0), SLR(1), or LALR(1) parsers we've talked about so far. LR(1) is an extremely powerful parsing technique, and it can be shown using some tricky math that any language that could be parsed deterministically by any shift/reduce parser has some grammar that could be parsed with an LR(1) automaton. (Note that this does not mean that all grammars that can be parsed deterministically are LR(1); this only says that a language that could be parsed deterministically has some LR(1) grammar). However, this power comes at a price, and a generated LR(1) parser may require so much information to operate that it can't possibly be used in practice. An LR(1) parser for a real programming language, for example, might require tens to hundreds of megabytes of additional information to operate correctly. For this reason, LR(1) isn't typically used in practice, and weaker parsers like LALR(1) or SLR(1) are used instead.

More recently, a new parsing algorithm called GLR(0) ("Generalized LR(0)") has gained popularity. Rather than trying to resolve the conflicts that appear in an LR(0) parser, the GLR(0) parser instead works by trying all possible options in parallel. Using some clever tricks, this can be made to run very efficiently for many grammars. Moreover, GLR(0) can parse any context-free grammar at all, even grammars that can't be parsed by an LR(k) parser for any k. Other parsers are capable of doing this as well (for example, the Earley parser or a CYK parser), though GLR(0) tends to be faster in practice.

If you're interested in learning more, over this summer I taught an introductory compilers course and spent just under two weeks talking about parsing techniques. If you'd like to get a more rigorous introduction to LR(0), SLR(1), and a host of other powerful parsing techniques, you might enjoy my lecture slides and homework assignments about parsing. All of the course materials are available here on my personal site.

Hope this helps!

Solution 2

This is what I have learnt . Usually LR(0) parser can have ambiguity, i.e one box of the table (you derive for creating the parser) can have multiple values (or) to better put it : the parser leads to two final states with the same input. So SLR parser is created to remove this ambiguity. Inorder to construct it find all the productions which lead to goto states , find the follow for the production symbol on the left hand side and only include those goto states which are present in the follow . This inturn means that you dont include a production which is not possible using the original grammer(coz that state is not in the follow set)

Solution 3

In the parsing table for LR(0) , the reduce rule for the production is placed in the entire row, across all the terminals whereas in SLR Parsing table the reduce rule for the production is placed only in the Follow set of left hand side Non-terminal of the reduce production.

The tool called parsing-EMU is very helpful in parsing and can generate first, follow, LR(0) itemset, LALR Evaluation etc. You can find it here.

69,084

Author by

Nitish Upreti

Building and Scaling Microsoft Azure.

Updated on December 03, 2021

Comments

Nitish Upreti over 2 years

I am working on my compilers concepts however I am a little confused... Googling got me nowhere to a definite answer.

Is SLR and LR(0) parsers one and same? If not, whats the difference?
NealB over 12 years

This is an excellent answer. Exactly answers the question in a very clear and educational manner. One of the best answers I have come across on SO.
Ira Baxter over 12 years

@templatetypedef: I think you should expound a little on the difference between L(AL)R(1) and SLR(1), which is why SLR(1) exists as as an interesting choice. But +1.
templatetypedef over 12 years

@Ira Baxter- I just updated the discussion to talk a bit more about LALR(1) and LR(1). Can you look over this and let me know if there's anything you think I should add?
Ira Baxter over 12 years

@templatetypedef: GLR can handle any context-free grammar. Other than that, this is good enough.
Felix Dombek over 12 years

A really good answer! I want to add that a CYK parser works slightly differently and though I can't judge your sentence "GLR tends to be faster in practice", there is a guaranteed upper limit of O(n³) time for the CYK, and no such upper limit for the GLR. The CYK achieves guaranteed performance by needing a pre-processed grammar, i.e. any context-free grammar can be rewritten to Chomsky normal form, which serves to guarantee that any reduction by a grammar rule will always result in a shorter sentential form.
Grijesh Chauhan over 11 years

@templatetypedef: Hi! I asked a question here stackoverflow.com/questions/13143186/… . I request to you please visit.
djfm over 10 years

@templatetypedef Awesome explanations, and your lectures are - to the best of my knowledge - the only ones making sense on this subject on the whole internet :) This inspired me to transform my online toy LR parser into a toy LR(1) parser (well, I think it is). If anyone is interested: parse.fmdj.fr
anni about 10 years

Excellent answer. Very clear explanation of the topic. Will definitely go through your lectures.
newbie over 9 years

@templatetypedef Is it correct that LR(0) parsers use only LR(0) automaton, whereas SLR parsers use also the ACTION subroutine?
templatetypedef over 9 years

@newbie LR(0) parsers do have an ACTION table, but the action depends purely on the state, not the state plus the next token.
newbie over 9 years

@templatetypedef Thank you, this make sense. Do LR(0) parsers build ACTION table calculating viable prefixes and looking for a handle or something else? I.e. if we have A → β· item in configuration set, should we say it's "reducing" state?
templatetypedef over 9 years

@newbie The item you gave above is a reduce item. The ACTION entry for this state would be to reduce.