In 2006-08-14 I started the Judy integration inside Pugs' trunk. This meant building both Judy and HsJudy in the default Pugs make command and using it for both IArray and IHash types (which represent Array and Hash perl types) and also for interning (convert strings/identifiers to words that have fast equality comparison). Lots of different build errors happened, but after a couple of days and help from the folks at #perl6, the build process became stable.
The most expected feature was providing a good support for sparse arrays, which was chosen by the Perl guys to be the default way to deal with arrays (cf. arrays as elements in contiguous space of memory, easy to navigate and access), and Judy solved this problem.
Some numbers comparing r12203 (pre-Judy) with r12215 (after Judy and some corrections to make it work correctly). I noticed that people in the channel usually get better numbers than I do, but anyway, using GHC 6.4.2, I got:
- ./pugs -C Parse-YAML src/perl6/Prelude.pm: uses the interning code, memory usage drops 15% and is 3% faster here.
- t/closure_traits/first.t was one of tests mentioned in channel. Here Judy version is 9% faster considering runtime (ignoring parsing time, which is the same for both).
- t/builtins/lists/minmax.t is little more than 10% faster for Judy (runtime measure too).
In feather, a P4 using GHC 6.4.1, now comparing r12203 against trunk, I got:
- ./pugs -C Parse-YAML src/perl6/Prelude.pm: uses the variable identifier interning code, 20% faster in trunk.
- t/closure_traits/first.t maintains around 10%.
Also, GHC 6.5 in general seems to be giving better performance: in my machine (an amd64 3200+) mandel.pl with Pugs on 6.5 runs in around 2m2s, with Pugs on 6.4.2 around 7m40s. For src/perl6/Prelude.pm its 4s versus 7s aprox. Since numbers are so different than feather it might mean there's something wrong with my GHC 6.4.2 binary...
Right now, only Judy.StrMap and Judy.IntMap are being used since we got some random segfaults with Judy.Hash. Changing between Judy.StrMap, Judy.Hash, Data.Map (with IORef help) and Data.HashTable now is very easy, since all of them instantiate MapM type-class.
HsJudy is my Summer of Code project. SoC ends today (thank you Google, Haskell and Perl folks) but the work with Pugs goes on, see you in #perl6... :)
Congratulations. Great work, and I'm gonna give it a try later, see if it helps me out.
Posted by: Luis Felipe | 2006.08.22 at 11:39 AM
Excellent stuff. I've needed faster implementations of the basic structures for a while now, but I would never have time to do it myself. The benchmarks are beautiful over here, up to 35% speed boosts. Haven't checked memory yet, as it isn't a priority. Thanks a bunch!
Posted by: Pietro KC | 2006.08.22 at 11:45 AM