What are the benefits of applicative parsing over monadic parsing?

haskell monads parsec applicative

12,029

Solution 1

The main difference between monadic and applicative parsing is in how sequential composition is handled. In the case of an applicative parser, we use (<*>), whereas with a monad we use (>>=).

(<*>) :: Parser (a -> b) -> Parser a -> Parser b
(>>=) :: Parser a -> (a -> Parser b) -> Parser b

The monadic approach is more flexible, because it allows the grammar of the second part to depend on the result from the first one, but we rarely need this extra flexibility in practice.

You might think that having some extra flexibility can't hurt, but in reality it can. It prevents us from doing useful static analysis on a parser without running it. For example, let's say we want to know whether a parser can match the empty string or not, and what the possible first characters can be in a match. We want functions

empty :: Parser a -> Bool
first :: Parser a -> Set Char

With an applicative parser, we can easily answer this question. (I'm cheating a little here. Imagine we have a data constructors corresponding to (<*>) and (>>=) in our candidate parser "languages").

empty (f <*> x) = empty f && empty x
first (f <*> x) | empty f   = first f `union` first x
                | otherwise = first f

However, with a monadic parser we don't know what the grammar of the second part is without knowing the input.

empty (x >>= f) = empty x && empty (f ???)
first (x >>= f) | empty x   = first x `union` first (f ???)
                | otherwise = first x

By allowing more, we're able to reason less. This is similar to the choice between dynamic and static type systems.

But what is the point of this? What might we use this extra static knowledge for? Well, we can for example use it to avoid backtracking in LL(1) parsing by comparing the next character to the first set of each alternative. We can also determine statically whether this would be ambiguous by checking if the first sets of two alternatives overlap.

Another example is that it can be used for error recovery, as shown in the paper Deterministic, Error-Correcting Combinator Parsers by S. Doaitse Swierstra and Luc Duponcheel.

Usually, however, the choice between applicative and monadic parsing has already been made by the authors of the parsing library you're using. When a library such as Parsec exposes both interfaces, the choice of which one to use is purely a stylistic one. In some cases applicative code is easier to read than monadic code and sometimes it's the other way round.

Solution 2

If a parser is purely applicative, it is possible to analyse its structure and "optimise" it before running it. If a parser is monadic, it's basically a Turing-complete program, and performing almost any interesting analysis of it is equivalent to solving the halting problem (i.e., impossible).

Oh, and yes, there's a stylistic difference too...

Solution 3

The main reason I can see to prefer applicative parsers over monadic parsers is the same as the main reason to prefer applicative code over monadic code in any context: being less powerful, applicatives are simpler to use.

This is an instance of a more general engineering dictum: use the simplest tool which gets the job done. Don't use a fork lift when a dolly will do. Don't use a table saw to cut out coupons. Don't write code in IO when it could be pure. Keep it simple.

But sometimes, you need the extra power of Monad. A sure sign of this is when you need to change the course of the computation based on what has been computed so far. In parsing terms, this means determining how to parse what comes next based on what has been parsed so far; in other words you can construct context-sensitive grammars this way.

Solution 4

With Parsec the benefit of using Applicative is just style. Monad has the advantage that it is more powerful - you can implement context sensitive parsers.

Doaitse Swierstra's UU parsing is more efficient if used only applicatively.

Solution 5

Monads are strictly a more featureful abstraction than Applicatives. You could write

instance (Monad m) => Applicative m where
  pure  = return
  (<*>) = ap

But there is no way to write

instance (Applicative a) => Monad a where
  return = pure
  (>>=) = ???

~~So yes, it is essentially a matter of style. I imagine if you use return and ap, then there should be no performance loss over using pure and <*>.~~ Because Applicative is a strictly smaller interface than Monad, this means that <*> can sometimes be more highly optimized than ap. (But with clever GHC rewrite rules, one can often achieve the same optimizations regardless.)

Is monadic parsing out?

Since Monads are a subset of Applicatives, I would conclude that applicative parsing is a subset of monadic parsing.

View more solutions

12,029

Author by

gawi

Solution architect, formely Java developer. 20 years experience in speech recognition telephony applications and call center solutions. Haskell wannabe. Main developper of the Rivr framework (http://rivr.nuecho.com)

Updated on June 14, 2022

Comments

gawi almost 2 years
There seems to be a consensus that you should use Parsec as an applicative rather than a monad. What are the benefits of applicative parsing over monadic parsing?
- style
- performance
- abstraction
Is monadic parsing out?
luqui over 12 years

ISTR that formally, because Haskell allows infinite grammars, monad does not actually increase the number of recognizable languages.
Daniel Wagner over 12 years

@luqui I'm curious about your comment. Here's a language. The alphabet is Haskell Strings, and the language is the set of words in which all the letters are equal. This is dead easy as a monadic parser: option [] (anyToken >>= many . exactToken) (where here anyToken and exactToken aren't actually a part of the Parsec library, but probably ought to be; ask me if you're unsure of what they do). How would the corresponding applicative parser look?
sdcvvc over 12 years

@stephen, can you give a reference for context sensitive parsers? I'm curious what is the exact power of monadic and applicative parsers.
Tom Crockett over 12 years

@sdcvvc: this paper discusses the relative power of arrow parsers vs. monadic parsers and points out that monadic parsers enable parsing context-sensitive grammars while arrows do not. I believe applicative parsers would be strictly less powerful than arrow parsers.
stephen tetley over 12 years

@sdcwc - quoting from Daan Leijen's Parsec manual: "Examples of context sensitive parsing are XML tags, variable definition before use and environments in TEX." See page 25.
Heinrich Apfelmus about 12 years

Wait! I had thought the same until today, when it occurred to me that the empty test can be applied to monadic parsers as well. The reason is that we can get the value you named ??? by applying the parser x on the empty string. More generally, you can just feed the empty string into the parser and see what happens. Likewise, the set of first characters can be obtained at least in a functional form first :: Parser a -> (Char -> Bool). Of course, converting the latter to Set Char would involve an inefficient enumeration of characters, that's where applicative functors have the edge.
sethuraman Janakiraman over 11 years

No, "use the simplest tool" may seem like a good rule of thumb, but actually is not. E.g., we use computer for writing letters, however computer to a sheet of paper is something like a table saw compared to a pair of scissors.
sethuraman Janakiraman over 11 years

I mean, there are always upsides and downsides for every choice, but mere simplicity is a bad basis for a choice. Especially when you're deciding whether to use Haskell. :D
Tom Crockett over 11 years

Yes, you're right. It would be better to say something like, "the right tool is the one which is maximally efficient while being minimally complex." What's missing from my description is the part about efficiency: you want a tool which is sufficiently powerful not just to do the job, but to make the job as easy as possible. But at the same time you don't want a tool which has lots of bells and whistles not applicable to the task at hand, since these most likely increase the complexity of operating it to no benefit.
Blaisorblade over 11 years

This post (and the comments) explains that applicative parsers allow parsing all recursively enumerable languages: byorgey.wordpress.com/2012/01/05/…
phadej almost 11 years

@HeinrichApfelmus You cannot get answer to empty that way. Or you can, but it's like giving answer to [en.wikipedia.org/wiki/Halting_problem] with "lets run the program and see if it halts".
phadej almost 11 years

@hammar: what if we run let x = pure f <*> y <*> x in empty x. If empty y is False, then computation doesn't terminate... just to point out, that it's not that simple.
Guildenstern over 10 years

Did you mean to say that Monads are a superset of Applicatives?
Dan Burton over 10 years

@Guildenstern Monadic operations are a superset of Applicative operations. But said another way: types have an instance of Monad are a subset of types that have an instance of Applicative. When speaking of "Monads" and "Applicatives," one is usually referring to the types, not the operations.
Pi Delport almost 10 years

The difference between Applicative and Monad has nothing to do with Turing-completeness. In Haskell, the relative difficulty of optimizing Monad instances is only due to the historical mistake of exposing (>>=) alone in the type class, making it impossible for instances to provide more optimized implementations of operators like ap. The Applicative class avoids this mistake, and exposes <*> (the equivalent of ap).
dfeuer almost 9 years

@PietDelport, I think the trouble is that the underlying representations of monadic parsers generally aren't amenable to optimized Applicative instances, while applicative parsers generally don't support >>=.
semicolon over 7 years

@PiDelport It has absolutely everything to do with Turing-completeness. >>= takes in the result of the parser on the left and passes it to the one on the right, thus making the resulting parser impossible to statically analyze, as now the parser itself is turing complete. <*> and <$> do not inspect the argument, so while the output of the parser itself is turing complete, as you can put anything to the left of the <$>, the parsing aspect itself restricted and statically analyzable.
semicolon over 7 years

"I imagine if you use return and ap, then there should be no performance loss over using pure and <*>." IIRC that is not the case. There are plenty of situations (parsers being only one of them) where <*> outperforms ap.
Dan Burton over 7 years

@semicolon you're absolutely right; I've updated my answer accordingly.
dfeuer over 7 years

@DanielWagner, to respond a mere five years late: you can lazily enumerate all strings that consist of just one character and apply asum. If the parser's <|> is sufficiently lazy, I believe this will get you the grammar you seek, very inefficiently. Of course, not all parsers have lazy <|>. regex-applicative certainly does not; I don't know about more traditional parser combinator libraries.