The past three days weren't all about license; live design sessions on #perl6 happens every day for hours, resolving the remaining ambiguities that slows down Perl 6 parsing (due to ambiguities that requires infinite lookahead), or the parts that were easy in Parsec but was difficult to declare as Rules.
A concerte example for infinite lookaheads:
$hash<1;2;3;long;string;here>; # one statemeent?
$hash<1;2;3;long;string;here; # 6 statements?
Because of this, Perl 6 always require a whitespace before < to denote an infix operator; non-whitespaces always means the postcircumfix <> macro, which then desugars to a postcircumfix {} lookup.
Pugs currently uses very expensive backtracking to disambiguate these cases, so getting rid of them should dramatically reduce parsing time.
While switching to Packrat Parsing (gaal++ for the pointer) would allow infinite lookahead and backtracking with linear time, it doesn't make it easier for humans to disambiguate between the two cases above -- so we are not doing that. Instead, I'm using Christopher Kuklewicz's excellent Text.Regex.Lazy package, a compiler that turns extended regexes with customized backtracking strategies (read: Perl 6 rules) into Parsec functions.
This should allow for piecemeal migration from Pugs's Parsec parser to a grammar fully defined in Perl 6 rules syntax, and allow maximum sharing with pmichaud and fglock's parsers.
So, while I only checked in about 15 changes to Pugs this week, there were more than 20 commits to the Synopses, from TimToady, pmichaud and yours truly. Among the changes where:
- my @a = 1,2,3; # this now works, as list-context assignment is listop
- my (Int $x, Str $y) = 1, 'foo'; # this works too; "my" now takes full Signature syntax
- Backtracking patterns are now constructed by its old name, regex, instead of rules, which never backtracks. The variant of rules that does not translate /a b/ to /a<ws>b/ but to /ab/ is called tokens. Damian++.
- The *$foo and **$foo syntax now always means "inject content of $foo into the current argument list", and has nothing to do with list contexts anymore.
- A new form :!foo to mean the counterpart of :foo, namely a pair with foo as key and false as value.
- There is no native str type anymore; it's now always buf, so the confusions of Perl 5's buffer->string autopromotion will not appear. See my encoding::warnings for an explanation of the misdesign -- Dan Kogan and I both thinks that's the #1 Bad Idea of the Perl 5 string model, and I'm glad it's gone from Perl 6 now. (That also means Str are also immutable and Buf are always mutable -- think of the latter as Byte Arrays.)
- if foo { 4 } { 5 } now always means if (foo) { 4 }; { 5 }; instead of if foo({4}) { 5 }, regardless of foo's signature.
- Per Migo (Mikhael Goikhman)'s suggestion during OSDC.il, a null first alternative in rules is now ignored, so we can write this BNF-ish parser:
rule answer {
| Yes { return 1 }
| No { return 0 }
| N/A { return undef }
}
There are much more interesting changes that makes Perl 6 parsing more predictive, and Rules more expressive to capture common patterns in parsing Perl 6. Ah, the joy of self-hosting... :-)
Recent Comments