Continuing yesterday's Parsec work, aided by gaal++'s work on adding previous-character, capture-name and capture-position fields to the parser state, today I dropped most of the backtrack-inducing try in Parser.hs:
<TimToady> "do, or do not. there is no try..."
This alone resulted in a modest speed up:
<gaal> so, yeah, I see about a 10% speedup
Because we are taking another Rule-bootstrapping route now (cf. scw++'s README for Pugs::Grammar::MiniPerl6), I disabled the rx_ macro for parsing rule adverbs in Perl 6, resulting in another 20% speedup:
<xinming> pugs++, It works faster 30% times than before... :-)
After that, the profile shows that the Parser no longer dominates time; the cost center had shifted to Data.Yaml.Syck, for loading the precompiled Prelude.pm.yml file. The Syck binding is really thin, and we didn't think there's much room for improvement.
However, the re-licensing plan means we'd need to move all dependencies in src/ to third-party/, so I checked the upstream sources for new version. Much to my delight, the Data.FastPackedString (FPS) code, used by the Syck binding, has been massively optimized by lambdafolks and renamed to Data.ByteString. Simply by upgrading to the new library, the startup time is cut by half:
<xinming> time ./pugs t/01-sanity/02-counter.t <--- result of this....
<xinming> real 0m1.118s
<xinming> user 0m1.068s
<xinming> sys 0m0.012s
<audreyt> you have a pretty good computer :)
<audreyt> ok, r10110 landed
<audreyt> try to see if it gets faster?
<xinming> real 0m0.494s
<xinming> user 0m0.456s
<xinming> sys 0m0.020s
<audreyt> so, really 2x faster :)
Then I moved to the next hotspot in profile, the ops function that sorts token by their length, to support a longest-token-match. A Schwartzian transform suffices to speed up another 6% overall:
ops f = map (f . tail) . sort . map (\x -> (chr (0x10FFFF - length x):x))
That's the same as this (under-golfed) Perl 5:
sub ops {
my $f = shift;
map { $f->(substr($_, 1)) } sort map { chr(0x10FFFF-length).$_ } @_
};
Finally, I lazily memoized the Manhattan distance between all built-in types, so the 16% of run time spent on MMD dispatch is now entirely gone. Compared to yesterday's Pugs:
$ time ./pugs.old -Iblib6/lib t/builtins/sprintf_and_as.t
real 0m11.444s
user 0m9.831s
sys 0m0.028s$ time ./pugs -Iblib6/lib t/builtins/sprintf_and_as.t
real 0m4.394s
user 0m3.811s
sys 0m0.028s
Not bad at all for one day's work. Maybe multi-hour smoke loops will become a thing of the past. Or maybe we just need more tests. :-)
Comments
You can follow this conversation by subscribing to the comment feed for this post.