pmichaud.com - Perl6

We've now reached the end of this year's advent series, what will be the gift in our last box? The door opens to reveal...the Perl 6 grammar.

At first it might seem odd to cite a language's grammar as a significant component of a language. Obviously a language's syntax matters a lot to people writing programs in that language, but once a syntax has been chosen, we just use grammars to describe the syntax and build parsers, right?

Not in Perl 6. In Perl 6, language syntax is a dynamic thing -- something that can be modified to accommodate new keywords and syntax that weren't anticipated by the original design. Or, perhaps more accurately, Perl 6 explicitly anticipates and supports the ability for modules and applications to change the language's syntax for their specific needs. Defining custom operators is just one example of a place where we change the language syntax itself, but Perl 6 also allows the dynamic addition of macros, new statement types, new sigils, and the like.

Thus a Perl 6 grammar and parser needs to not only parse the standard Perl 6 syntax, it also has to be modifiable at runtime to be able to parse custom syntaxes as well. We also need to be able to encapsulate any language modifications, so that defining a new operator in one module doesn't inadvertently change the interpretation of another module in unintended ways.

This is what is achieved by the Perl 6 standard grammar, and much of the effort that has gone into the Perl 6 specification for regexes and grammars (Synopsis 5) has been just to make these sort of things possible. I personally believe this is one of the key features that will enable Perl 6 to remain a viable language far into the future. (On the other hand, when I first read the designs for Perl 6 in detail, I had serious doubts as to whether this could in fact be achieved. It's nice to see that we've overcome that particular hurdle.)

The expectation is that parsers for Perl 6 will themselves be written in Perl 6, and there are several examples already available. The "standard" or "reference" grammar and parser is STD.pm; Larry has been using this to refine the Perl 6 language specification and explore the impacts of various language constructs on the writing of Perl 6 programs.

Some parts of STD.pm are still evolving in response to implementation concerns; thus Rakudo Perl maintains its own version of the language grammar that works for its environment. Many of the ideas first explored by Rakudo often find their way back into the standard grammar. This is by design -- our expectation is that the various grammar implementations will continue to converge over the course of the next year.

The key feature that jumps out from looking at the Perl 6 grammar is the use of protoregexes. A protoregex allows multiple regexes to be combined into a single "category". In a more traditional grammar, we might write something like:

    rule statement {
        | <if_statement>
        | <while_statement>
        | <for_statement>
        | <expr>
    }
    rule if_statement    { 'if' <expr> <statement> }
    rule while_statement { 'while' <expr> <statement> }
    rule for_statement   { 'for' '(' <expr> ';' <expr> ';' <expr> ')' <stmt> }

With a protoregex, we'd write it as follows:

    proto token statement { <...> }
    rule statement:sym<if>    { 'if' <expr> <statement> }
    rule statement:sym<while> { 'while' <expr> <statement> }
    rule statement:sym<for>   
        { 'for' '(' <expr> ';' <expr> ';' <expr> ')' <stmt> }
    rule statement:sym<expr>  { <expr> }

We're still saying that a <statement> matches any of the listed statement constructs, but the protoregex version is much easier to extend. In the non-protoregex version above, adding a new statement construct (such as "repeat..until") would require rewriting the "rule statement" declaration in its entirety to include the new statement construct. But with a protoregex, we can simply declare an additional rule:

    rule statement:sym<repeat> { 'repeat' <stmt> 'until' <expr> }

This newly declared rule is automatically added as one of the candidates to the <statement> protoregex. All of this works for derived languages as well:

    grammar MyNewGrammar is BaseGrammar {
        rule statement:sym<repeat> { 'repeat' <stmt> 'until' <expr> }
    }

Thus MyGrammar parses everything the same as BaseGrammar, with the additional definition of the repeat..until statement construct.

The ability to dynamically replace the existing grammar with a new one that has different parse semantics is at the heart of Perl 6's operator overloading, macro handling, and other syntax modifying features. Unlike source filters, this provides a much more nuanced approach to declaring new constructs in a language.

Another significant component of the standard grammar is its devotion to providing useful error diagnostics when an error is encountered. Instead of simply saying "an error occurred here", it offers suggestions about what might have been intended instead, and places where it thinks the programmer may have been confused. It also does significant work to catch constructs that have changed between Perl 5 and Perl 6, to assist people with migration.

In late October of this year, Rakudo started a significant refactor in a new branch (called "ng") that makes use of protoregexes and the many other features of the STD.pm grammar. We still have a short way to continue before this new branch can become the official released version of Rakudo, but we expect that to happen in the January 2010 release. Already this conversion has enabled us to resolve many of the long-standing problems in Rakudo, including dynamic generation of metaoperators, lazy list handling, lexical context handling, and the like.

With Rakudo's conversion to following STD.pm for its grammar, we're very much on track for the Rakudo Star release in April 2010. While we know (and plan) that Rakudo Star won't be a complete implementation of Perl 6, it will be sufficiently advanced and usable for a wide variety of applications. We've been quickly resolving the critical items listed in the Rakudo Star ROADMAP, and over the next couple of months will be focusing on improved error reporting (like STD.pm) and distribution / packaging issues.

...and this concludes the Perl 6 Advent series for December 2009. We hope that you've enjoyed reading the articles at least as much as we've enjoyed writing them, and we appreciate the many comments that people have made about the posts. We also hope to have conveyed our sense that many useful parts of Perl 6 are available now for experimentation, and that we're well on the way to making them available in 2010 for a wider variety of applications. Indeed, we have high hopes and expectations for the entire Perl family in 2010 -- it promises to be an exciting time for us all.

Happy holidays, and best wishes for the new year.