What would the best tool to create a natural DSL in Java?

12,258

Solution 1

Considering the complexity of lexing and parsing, I don't know if I'd want to code all that by hand. ANTLR isn't that hard to pickup and I think it is worthing looking into based on your problem. If you use a parse grammar to build and abstract syntax tree from the input, its pretty easy to then process that AST with a tree grammar. The tree grammar could easily handle executing the process you described.

You'll find ANTLR in many places including Eclipse, Groovy, and Grails for a start. The Definitive ANTLR Reference even makes it fairly straightforward to get up to speed on the basic fairly quickly.

I had a project that had to handle some user generated query text earlier this year. I started down a path to manually process it, but it quickly became overwhelming. I took a couple days to get up the speed on ANTLR and had an initial version of my grammar and processor running in a few days. Subsequent changes and adjustments to the requirements would have killed any custom version, but required relatively little effort to adjust once I had the ANTLR grammars up and running.

Good luck!

Solution 2

You might want to consider Xtext, which internally uses ANTLR and does some nice things like auto-generating an editor for your DSL.

Solution 3

If you call that "natural language", you're deluding yourself. It's still a programming language, just one that tries to mimic natural language - and I suspect that it will fail once you get into implementation details. In order to make in unambiguous, you'll have to put restrictions on the syntax that will confuse the users who've been led to think that they're writing "English".

The advantage of a DSL is (or should be, at any rate) is that it's simple and clear, yet powerful in regard to the problem domain. Mimicking a natural language is a secondary concern, and may in fact be counter-productive to those primary goals.

If someone is too stupid or lacks the ability for formally rigorous thinking that's required for programming, then a programming language that mimicks a natural one will NOT magically turn them into a programmer.

When COBOL was invented, some people seriously believed that within 10 years there would be zero demand for professional programmers, since COBOL was "like English", and anyone who needed software could write it himself. And we all know how that's been working out.

Solution 4

The first time I heard of DSL was from Jetbrains, the creator of IntellJ Idea.

They have this tool: MPS ( Meta Programming System )

Solution 5

You might find this multi-part blog series I did on using Antlr to be useful as a starting point. It uses Antlr 2, so some stuff will be different for Antlr 3:

http://tech.puredanger.com/2007/01/13/implementing-a-scripting-language-with-antlr-part-1-lexer/

Mark Volkman's presentations/articles on Antlr are quite helpful as well:

http://www.ociweb.com/mark/programming/ANTLR3.html

I will second the suggestion about the Definitive ANTLR book, which is also excellent.

Share:
12,258
Jay El-Kaake
Author by

Jay El-Kaake

Updated on July 14, 2022

Comments

  • Jay El-Kaake
    Jay El-Kaake almost 2 years

    A couple of days ago, I read a blog entry (http://ayende.com/Blog/archive/2008/09/08/Implementing-generic-natural-language-DSL.aspx) where the author discuss the idea of a generic natural language DSL parser using .NET.

    The brilliant part of his idea, in my opinion, is that the text is parsed and matched against classes using the same name as the sentences.

    Taking as an example, the following lines:

    Create user user1 with email [email protected] and password test
    Log user1 in
    Take user1 to category t-shirts
    Make user1 add item Flower T-Shirt to cart
    Take user1 to checkout
    

    Would get converted using a collection of "known" objects, that takes the result of parsing. Some example objects would be (using Java for my example):

    public class CreateUser {
        private final String user;
        private String email;
        private String password;
    
        public CreateUser(String user) {
        this.user = user;
        }
    
        public void withEmail(String email) {
        this.email = email;
        }
    
        public String andPassword(String password) {
            this.password = password;
        }
    }
    

    So, when processing the first sentence, CreateUser class would be a match (obviously because it's a concatenation of "create user") and, since it takes a parameter on the constructor, the parser would take "user1" as being the user parameter.

    After that, the parser would identify that the next part, "with email" also matches a method name, and since that method takes a parameter, it would parse "[email protected]" as being the email parameter.

    I think you get the idea by now, right? One quite clear application of that, at least for me, would be to allow application testers create "testing scripts" in natural language and then parse the sentences into classes that uses JUnit to check for app behaviors.

    I'd like to hear ideas, tips and opinions on tools or resource that could code such parser using Java. Better yet if we could avoid using complex lexers, or frameworks like ANTLR, which I think maybe would be using a hammer to kill a fly.

    More than that, if anyone is up to start an open source project for that, I would definitely be interested.