The Definitive ANTLR Reference 95
Joe Kauzlarich writes "Finally, someone has done us all the great service of publishing a
book about the second most well-known compiler compiler, Terence
Parr's Antlr, and it was written, moreover, by Parr himself and
published as part of the somewhat-usually-reliable Pragmatic Bookshelf
series. Take note, while it requires a JVM to run, Antlr is not just
for Java developers; it generates compilers in Python, Ruby, C, C++,
C# and Objective-C. Also note that this book is more than just an
elaborated man-page; it is also an excellent introduction to the
concepts of compiler and parser design." Keep reading for the rest of Joe's review.
First off, I have no preference between Yacc-style parsers, JavaCC and
Antlr; I've never used Yacc, have used JavaCC in college and have
since played with Antlr and am just as ignorant in the use of them
all. The fundamental difference is that Antlr is a top-down LL(*)
(simply-put, variable-lookahead) parser generator while Yacc is a
bottom-up LR parser generator. JavaCC is also top-down, but employs a
different parsing strategy. The book describes the meanings of these
terms in simple detail.
The Definitive ANTLR Reference | |
author | Terrance Parr |
pages | 361 |
publisher | Pragmatic Bookshelf |
rating | 9 |
reviewer | Joe Kauzlarich |
ISBN | 978-0-9787392-5-6 |
summary | introduction to parser/compiler design using ANTLR |
I happen to have learned in my experience that good documentation for any of these products is hard to come by and difficult to follow, simply because the subject matter is obtuse and few, until now, have ventured to write expository literature to explain the myriad concepts to the non-academician. Of the three mentioned above, Antlr appears to be the more 'modern' and can also generate lexers from within the same grammar definition file, so the notions are integrated. Antlr also has a useful IDE called AntlrWorks with visualization features, causing grammar construction to be far simpler for a beginner.
That said, I don't wish to use this review to push Antlr over its alternatives, but only to press the point that this book serves not only to introduce Antlr to the average programmer, but the concepts of parser design as well. The concepts become necessary to understand while writing and debugging grammars, as not everything written in Backus-Naur Form will produce a working parser, and this holds true for any parser generator. Learning what works and what doesn't, as well as what workarounds are available, is key to becoming proficient in Antlr, Yacc or JavaCC. Once proficiency is acheived, you'll have the valuable skill of producing domain-specific languages on demand.
Terence Parr, as mentioned before, is not only the author and maintainer of Antlr, but he wrote the book as well. Antlr is on its long-awaited third version and has been maintained by Parr throughout the project's lifetime. He is a university professor and himself developed the path-breaking LL(*) parsing strategy employed by Antlr.
Parr begins with a one chapter background in computer language design before diving into a simple example of a parser for basic integer expressions. Part II is the meat of the book, describing various aspects of writing grammars for Antlr. Generally speaking, he covers the basic semantics of grammar writing, the many optimization, supplementary and 'workaround' options provided by Antlr, grammar actions and attributes, syntax trees, error reporting and related practical topics.
The third part, Understanding Predicated LL(*) Grammars, is the valuable 'textbook' portion of the book. It gives readers a short and comprehensible introduction to exactly what predicated-LL(*) means as well as a look at how competing parser generators work in contrast.
Both of the second and third parts are scattered with theoretical tidbits to help language designers better understand why grammars must work as they do. Those who can't pick their nose without a rudimentary theoretical overview of the subject can enjoy a few casual browsings through the book before even sitting in front of a computer. It works *almost* that well as a textbook, though it still doesn't approach such classics as Aho, et al's, Compilers: Principles, Techniques, and Tools (if you want to get seriously involved in compiler design). Take it for what it is though, as a chance to learn a tool of possible value without having to dig through old mailing lists and last-minute README's on the one hand, as was much the case a year ago, and on the other hand, devoting pain-staking class and study time to a lot of theory you won't find of practical value.
So I'll recommend this book on the basis that there's nothing else like it available; and don't wait until a project comes along that requires knowledge of compiler design, because there's a heck of a learning curve (I'm still on the very low end and I wrote a compiler in college). If you think compiler or parser design is interesting or may conceivably write a domain-specific language for your workplace, the Definitive Antlr Reference is not only a good place to start, but one of the only places to start short of signing up for a university course.
You can purchase The Definitive ANTLR Reference from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
LR versus LL (Score:2)
Re:LR versus LL (Score:4, Funny)
IOW, AntLR is a "not LR" parser.
Hey, it made sense when I wrote it.
Re: (Score:3, Informative)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Antlr Parses Java Just fine (Score:2)
sure, we skimped on the java implementation.. but it was able to handle simple functions like factorial, and sorting algorithm, Objects.. I didnt manage to get inheritance to work, but that's my goof.
Storm
Re: (Score:3, Funny)
Re: (Score:1)
2. The set of languages parseable by an LR parser is a superset of the set parseable by an LL parser.
Ergo, yes, you can indeed parse Java with an LR parser.
Re: (Score:1)
Re: (Score:2)
A fine single malt that Kauzlarich
What are the most "standard" parser generators? (Score:5, Interesting)
I recently ran across this problem at my job: we maintain compilers for several(!) in-house languages, and I recently re-wrote the one for the most simple of them, changing it from a collection of three separate utilities (the most complicated of which was written in FORTRAN, which is generally horrible for manipulating text) into a Lex/Yacc/C (or rather, Flex/Bison/C) compiler.
I chose Lex and Yacc not because they were good, but because they're (in my opinion) very likely to be around 50 years from now. Are there any other compiler generators (such as possibly ANTLR) that might also meet this criteria, and would have been a better choice?
Re: (Score:2)
Quite right; sorry!
Re:What are the most "standard" parser generators? (Score:4, Funny)
Re: (Score:2)
Ve have VEYS uff makink you parse...
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Other source (Score:2)
I am currently reading Programming Language Pragmatics. It's pretty good I think, but then, I have nothing to compare with. I'll probably pick up the Antlr book too.
Re: (Score:2)
Most Well Known? (Score:2)
Re: (Score:3, Insightful)
Re: (Score:2)
I've not benchmarked the code it produces though, so I can't comment on its output.
Re: (Score:2)
I've no idea whether that's true, of course.
Re: (Score:1)
Re: (Score:1, Funny)
Re: (Score:2)
Back when we were kids, the compiler courses taught us about limitations to LL(k) grammars, that it turns out, aren't true (ok, ok - the theory was in fact correct, but the practical implications they passed on were in fact incorrect).
Enter ANTLR - it changed the game - and you should get to know it and why it it did. Generic LL(k) grammars at your fingertips.
I thought I understood this stuff, because of so called "definitiv
Re: (Score:2)
1 grammar - 1 target language (Score:2)
We currently have a grammar that needs to be preprocessed a bit before we feed it to ANTLR to produce the parser. It's only about 5 lines in the grammar that need to be changed.
Re: (Score:1)
Re: (Score:2)
grammar Cps;
options {
k = 1;
output = AST;
superClass = CpsParserBase;
language = Java;
and there is no way to set the target
outside of the grammar */
}
[...]
@head
Good stuff... (Score:5, Interesting)
They've drastically reduced the freely available documentation on their web page, so you are essentially forced to buy it.
Re: (Score:1)
..problem is, you can't really do anything non-trivial in ANTLR 3.0 without buying the book. They've drastically reduced the freely available documentation on their web page, so you are essentially forced to buy it.
I don't know where you get that idea from, other than a few idiot postings on the ANTLR mailing list perhaps, or maybe you were one of those and are just trolling. Anyway, it is bollocks - you think anyone working for free on a free open source project can be bothered to go DELETE documentation? Nothing has ever been removed from the web site unless it was wrong, example grammars are all over the place and there is a wiki that tells you how to do anything the book does and is added to pretty regularly.
Re: (Score:2, Informative)
Re: (Score:2, Informative)
Re:Good stuff... (Score:5, Informative)
Re: (Score:1)
Re: (Score:1)
Personally I thought the book was great. I got the PDF version. The free documentation while it could be better is still okay however I'd recommend just getting the book.
Re: (Score:2)
The fact that you directly benefit from the book is a plus if it means there's more of an incentive for you to work on ANTLR, so I've just been to Amazon to order a copy.
Re: (Score:1)
I couldn't quite get Antlr 2.0 to work for my Domain Specific Language application (a Decision Table based Rules Engine), and that mostly because digging through all the online documentation answered my questions at simpl
Re: (Score:2)
I just thought that you should know that it made development "interesting" (read:very trial-and-error) until someone from the lab bought the book. It might also make it more difficult for beginners to get into using the tool, although I suppose you could make it a required text if you were teaching a class.
P.S.: Love the interpreter - that by itself saves a bunch of time!
A new framework comparable to ANTLR: Gazelle (Score:5, Interesting)
The primary thing I am trying to deliver is reusability of parsers. The open-source community should be able to cooperate to write canonical parsers for all popular languages, but this goal is hampered by the fact that almost all parsing tools (ANTLR included) encourages you to write non-reusable grammars by virtue of the fact that you embed actions into the grammar.
Gazelle also takes a interpreter+JIT approach instead of emitting code in an imperative language. So for example, if you want a really fast HTTP parser from Ruby (which is precisely the raison d'etre for Mongrel), you can use the HTTP Gazelle parser from Ruby, but since the parsing is actually performed by the interpreter+JIT (written in C), you get extremely fast parsing without writing a line of C.
Gazelle is still very immature and not ready for people to try out, but I would encourage anyone who's interested to follow the Gazelle category on my blog [reverberate.org].
You can also check out:
- the current draft of the manual [reverberate.org], which will give you a better idea of the specifics of where I'm going with this.
- a graphical dump of the grammar for JSON [reverberate.org], which the current development code is capable of generating.
Re: (Score:2)
Very interesting project.
Thanks!
I did not see any examples of how, say, 2 different languages can bind to the canonical grammar engine gazelle produces. Is there going to be an API for each supported language to bind actions to code in the host language?
Yes. And rather than using something like SWIG, I want a lot of thought to go into each language's binding, so the interface is really idiomatic for each language.
For a look at my very preliminary version of the C api, check out this program that is built on Gazelle 0.1. Check out the "register_callback" calls in this source file:
recs-collate.c [github.com]
As this gets more sophisticated, I want to use CSS selectors and/or XPath as a model for specifying the predicates for when callbacks are called. For exam
I really like ANTLR. (Score:4, Interesting)
I also like Sorcerer. In PCCTS 1.3, Sorcerer is a kind of tree traversal grammar tool. You create ASTs, with ANTLR, and Sorcerer creates a program which will traverse them, and call action routines where you specify. It's really pretty neat.
I'm thinking about something else though. I'm thinking we should really think about programming with grammars more than we do. Say, for example, you have a user interface of some kind. It gets certain events, its state changes, and it reacts to the environment. A good fraction of the set of state changes can be captured with some kind of finite state machine. But a context free grammar is equivalent to a finite state machine with a pushdown list to hold context. So, it seems very likely to me that a good way to build user interfaces is to somehow compose grammars. The tokens are events, and the action routines are the non-FSM state changes.
So, why is this interesting in this discussion? Well, ANTLR from PCCTS 1.3 is a recursive descent parser, and YACC/Bison are bottom up parsers. This means that the pushdown list for ANTLR is the execution stack, and the pushdown list for YACC/Bison is in data space. It's hard to see how one would maintain multiple ANTLR-style automata concurrently, but that's what you want to do for this style of programming.
Generally YACC/Bison pushdown lists and other parsing data are kept in global variables, but there is a way to make Bison generate C++ useable grammars where the parsing data are saved in a heap allocated object. This means they have a fixed size, which may be a problem. But it would not take a lot of work to change the parsing algorithm for Bison to make the pushdown list a linked list, and that might make things easier.
So, in short, I think it's pretty interesting to look at parsing, even if you're not writing compilers.
Re: (Score:2)
Ditto. I am also a big fan of embedding mini-languages in bigger systems and of ANTLR for all the reasons you state plus a few more [transitionchoices.com].
Three cheers to Terence Parr for this remarkable technology.
I have not taken the step to upgrade to version 3. I hear that the grammar specifications are significantly different. I have a question. Is it worth all the rewriting of grammar and migration of scripts to upgrade? Has anyone here used ANTLRWorks? I am really pleased with version 2.7.5 so it is hard to get moti
Interpretting parser (Score:2)
Re: (Score:1)
Stupid grammar quesition (Score:2)
I always thought of it as Pains-taking.
IE:
I have taken great pains to get this right.
Just curious if i have been using it correctly.
Pains-taking (Score:2)
Cool! But who needs parsers? (Score:2)
But lately I've been wondering: Do we really need parser generators anymore? In Ruby, if you're writing a DSL, you usually implement it in terms of the Ruby language itself. I imagine that's true for other dynamic languages too. Lo
Re: (Score:2)
OK, I did, but it's not done running yet...
Seriously? (Score:2)
Re: (Score:3, Insightful)
Obscure, not obtuse (Score:2)
No, it's not obtuse, it's obscure.
Obtuse is the antonym of acute. In geometry, an obtuse angle is one greater than 90 degrees. Metaphorically, an obtuse person is one who is not sharp.
Obscure, on the other hand, from the Latin word for "dark", means difficult to perceive or understand.
Re: (Score:2)
Comment removed (Score:5, Interesting)
Re: (Score:3, Interesting)
I am posting this simply so that others can see a different view and judge for themselves.
I have used ANTLR for years (not version 3) and have had no trouble getting it to do what I want. I have not tried to get it to interpret COBOL code, however. I have even used ANTLR in .NET and found it to be easy, breezy.
Keep in mind that this is no drag-and-drop technology for light weights. You are really going to have to know your compiler and formal language theory and be willing to study some sample gramma [antlr2.org]
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
COBOL is pretty straightforward. I did something similar with awk. One script of about 150 lines handled most of the essentials. Output files were another awk script with field offsets and lengths for usage in the converted extract, a file with extraction commands for a main
second-most well-known? Really? (Score:1)
I was pretty sure the two best-known compiler compilers were yacc and bison, though I couldn't have told you which one is the most well-known and which one the second-most well-known these days. (I know which is older and which is newer... but that isn't necessarily the same thing.) I've never heard of Terence Parr's Antlr before. (I _have_ heard of PGE, but only because I read Perl-related news occasionally.)
Hmmm... (Score:2)