Despite sleeping 11+ hours each day, I did get plenty of design and discussion done with gaal, in particular a pdd20-inspired refactoring for lexical pads, which I'll write about in another entry. The recent pX spike project of a Perl 6 rules implementation on Perl 5 -- and use it to parse Perl 6 programs -- is very much worth journaling about as well, but that'd take another entry too.
However, most of our pair-coding time was spent on improving the most egregious showstopper to would-be Pugs hackers -- namely, that the "make; make test" cycle simply took too much time.
This issue was brought to #perl6's attention as part of chromatic's poignant rant, citing that Pugs took 8 hours to complete building and run all its tests. And because he is a devout TDD follower, he'd like to run all tests after he made any change to the Pugs internals, which would take (gasp) 16 hours.
Most of his other points in the rant can be resolved directly:
- Test::Builder did fail its tests for a while, but was repaired along with other OO modules as part of release engineering before 6.2.11. Adopting a regular release cycle will fix that.
- Hooking up to Parrot as a runtime will no doubt bring more contributors (and get us faster-than-C performance), but hooking up to Perl 5 will obviously bring even more. Fortunately, we are doing both, plus JavaScript (and maybe CLR too, now that I'm going to YAPC::NA in Chicago and will probably visit LINQ folks en route.)
- Because the Synopses are user-level requirements, Pugs would need its own PDD-like set of documentation that discusses the design of various compiler components. I'd like to resume the Pugs Apocrypha series of documents with nothingmuch et al during the Hackathon.
But the most pressing Pugs is slow issue demands a technical solution: the cycle takes 4 hours on my laptop -- 30 minutes to compile and 210 minutes to finish testing, which is simply too much, even if we take into account that we have 616 test files and 11070 test cases.
The current situation was mainly caused by Prelude.pm, a module with built-ins (such as printf) implemented in Perl 6 itself and loaded for each Pugs run. The problem is, compiling the Prelude.pm takes 15 seconds here, and it will add another 2.5 hours to the test cycle if it has to be reloaded for each test file.
Many moons ago (July 2005), gaal hacked in support for precompiled Prelude, using the ./pugs -CPugs backend to turn Perl 6's parse tree into huge Haskell expressions, and rebuild the Pugs executable again with the Prelude statically linked. This shaved the startup time from 15 seconds to 0.5 seconds.
The tradeoff is that this makes compilation of the Pugs.Run awfully slow (20+ minutes for optimised builds) and consumes a lot of RAM (curiously, even more so on unoptimised builds). One can turn precompilation off by tweaking config.yml to say precompile_prelude: false, but that will make tests unbearably slow to finish.
Gaal set forth to fix this problem once and for all, by using YAML as the cached intermediate format, much as Python's .pyc/.pyo bytecode files. We wrote a rule for DrIFT that can generate fromYAML and asYAML instance methods for all Haskell types, which provides roundtrip serialization to our Syck bindings.
The upshot is that the new ./pugs -CParse-YAML backend can turn Perl 6 into a YAML syntax tree, which can be loaded back during runtime using the Pugs::Internals::eval_p6y($file) primitive. Thanks to Syck's speedy parser, the startup time is now 0.7 seconds without any additional time penalty to compiling Pugs.Run, bringing the total compilation time down to 8 minutes (optimised build; unoptimised takes 4 minutes).
This goes a long way in solving the compilation time problem; moving the DrIFT instances away from e.g. Pugs.AST.Internals to another module will probably save another minute or two.
Tomorrow we will apply the same technique to Test.pm (as well as other .pm files). Seeing that each test file currently takes 5 seconds to load Test.pm, yamlizing it will likely save another hour off the test cycle. And if we start making use of cached .t.yml.gz files next to each .t programs, the entire build-test cycle can probably be reduced to 30 minutes or less. That will be lovely indeed. :-)
Are you going to do something cunning like store a hash of the original source file in the YAML version, and then check the hash is valid when loading the pre-compiled version, so that you automatically avoid stale precompiled versions?
Posted by: Nicholas Clark | 2006.02.23 at 06:24 AM