2007.03.02

MO bridge landing!

After 3 months of delay, it was quite amusing that the initial MO bridge took only 3 hours to land:

pugs> vv('hello, world').HOW
^Str
pugs> vv('hello, world').HOW.HOW
^Class

(The vv() above is a makeshift call that casts 6.2.0-land values into 6.28.0-land objects; it will go away once all existing built-in types are wrapped into MO.)

To understand what the ^Class notation means, let's take a brief look at Perl 6's object model, as implemented by Pugs 6.28.0's MO (Meta Object) subsystem.

In a traditional class-based object system such as Ruby's, the String class would be an instance of the Class class:

irb(main)> String.instance_of?(Class)
=> true
irb(main)> String.instance_of?(String)
=> false

Because of this, you can't call .new on a String instance, and you can't call .length on the String class:

irb(main)> String.new.new
NoMethodError: undefined method `new' for "":String
irb(main)> String.new.length
=> 0
irb(main)> String.length
NoMethodError: undefined method `length' for String:Class

Perl 5, on the other hand, takes an unique approach: When we say IO::String->new, the IO::String is not a Class object -- rather, it is a prototypical IO::String object that has no attributes!

IO::String->isa('Class');      
#=> false (perl5 doesn't have a built-in Class)
IO::String->isa('IO::String');
#=> true
IO::String->new->new;         
#=> IO::String=GLOB(0x18ab160)
IO::String->new->getpos;      
#=> 0
IO::String->getpos;            
Error: Can't use string ("IO::String") as a symbol ref...

The advantage of this arrangement is that several object systems -- prototype-based, closure-based, et cetera -- can exist simultaneously in the same program, without having to inherit from an universal Class class.

However, a bare literal "IO::String" is a terrible way to represent a prototypical object: it makes reflection needlessly difficult, and the error message for accessing the non-existing attribute slot ("Can't use string as a symbol ref...") seems to obey the principle of most surprise.

Perl 6's solution to this is simple: The prototypical string object, spelled ::Str or simply Str (if it's in scope), is a genuine Str instance.  However, any attempt to access its attributes raises a sensible exception.  Just as in Perl 5, so-called class methods such as .new are simply those methods that does not access instance attributes, and you can call them on both ::Str and regular Str instances.

Calling an object's WHAT method returns the prototypical object.  This replaces Perl 5's ugly ref($x) || $x idiom:

pugs> 'hello'.WHAT
::Str
pugs> Str.WHAT
::Str

On the other hand, because Perl 6's builtin objects are backed by a normal class-based dispatch system, you can reliably obtain a list of all Str's supported methods, by querying the Class instance that implements ::Str:

pugs> vv('hello').HOW
^Str
pugs> vv('hello').HOW.methods
["HOW","WHICH","bless","reverse"]

So there we have it: the Perl5ish prototypical object Str.WHAT is also spelled ::Str, and the Rubyish class object Str.HOW is also spelled ^Str.

The next step is to expose all MO's meta-objects (Role/Method/Class/Object) into Perl 6 land, and adapt our Perl 5 bridge to use Moose.pm, such that a class Foo {...} declaration in Pugs can generate both Haskell-side and Perl5-side representations, and work seamlessly with libraries on either side.  Stay tuned!

2006.10.22

More SMP parallelism.

After some discussion on haskell@, Sebastian Sylvan suggested a much more straightforward way of hyperizing computations.  By request of Nicholas, here is the hyperization code before parallelization:

mapM evaluate xs

and this is the code after parallelization:

mvs <- forM xs $ \x -> do
    mv <- newEmptyMVar
    forkIO (evaluate x >>= putMVar mv)
    return mv
mapM takeMVar mvs

Or, in Perl 6 (without using hyper-operators themselves):

# Before
@xs.map(&evaluate);

# After
my @mvs = @xs.map: -> $x {
    my $mv = MVar.new;
    async { $mv.put: evaluate($x) };
    $mv;
};
.take for @mvs;

The main point here is that forkIO/async does not actually create an OS thread; instead, it creates a new task for the preemptive GHC runtime kernel, which then assign it to one of the CPUs currently available, via pre-spawned OS threads.

With this strategy, numbers on feather now looks better, and so does my dual-core Macbook:

$ time env GHCRTS=-N1 ./pugs -e 'my @x = 1..50000; @x.>>sqrt'
real    0m5.204s
user    0m5.093s
sys     0m0.088s

$ time env GHCRTS=-N2 ./pugs -e 'my @x = 1..50000; @x.>>sqrt'
real    0m3.404s
user    0m3.937s
sys     0m0.107s

Note that it's now taking more user-time than real-time, which means SMP is doing its job correctly.

The profiler seems to point to GC performance as the major factor preventing true linear scalability, which GHC 6.8 will address by having a true multithreaded GC implementation.  It'll be fun to try this little experiment again once it arrives. :-)

2006.10.21

SMP parallelization comes to Pugs!

The Intel talk went well.  Met several interesting hackers working on Harmony, STM, program transformation, and other cool projects.

During the follow-up discussion, a few similar languages were brought up: Steele et al's Fortress, the recently open-sourced Strongtalk, and of course Scala. Perl 6 compares rather favorably with them in many areas, although there's always more interesting ideas (case classes, monad comprehensions, etc) that we can learn from.  Pugs's dire need of JSR292 for targeting JVM was also discussed in some detail.

In explaining Perl 6's approach to SIMD parallelism in the form of  hyper operators and junctions, one of the Haskell hackers there reminded me of GHC 6.6's shiny new support for SMP parallelism, where you can say "env GHCRTS=-N2 ./pugs ..." and have Pugs allocate two OS threads for those hyper/junction operations, which will (in theory) make them automagically  twice as fast.

So I hacked it into Pugs right away; it's just a ten-line change! As usual, it revealed some hidden corners of the language.  (For example, $x++ should be implicitly atomically { $x++ }.)  The preliminary results are quite impressive on feather, the main Linux machine for the Perl 6 development community:

$ time env GHCRTS=-N1  ./pugs -e 'my @x = 1..50000; @x.>>sqrt'
real    0m16.623s
user    0m12.729s
sys     0m0.212s
$ time env GHCRTS=-N2  ./pugs -e 'my @x = 1..50000; @x.>>sqrt'
real    0m13.605s
user    0m12.045s
sys     0m0.320s
$ time env GHCRTS=-N3  ./pugs -e 'my @x = 1..50000; @x.>>sqrt'
real    0m9.438s
user    0m8.293s
sys     0m0.308s

It's not quite linear scalability yet, and the Channel-based hyperiser is not yet optimized, but it's quite encouraging already.  Once the recent flurry of work on running parallel Haskell on GPUs come to fruition, the numbers may become even more impressive...

Also, thanks to fglock++'s recent work on v6.pm's new emitter, I was able to explain the concept of contexts clearly for the first time:

"In Perl 6, we think of a function as compiled in three different ways, depending on the kind of evaluation strategy you'd like to use: either you use it just for the side effects, or you'd like an lazily evaluated stream, or you'd like an eagerly evaluated value."

While it's also a Perl 5 legacy which had nothing to do with evaluation strategies, it's nice that the concept has evolved into something much more attractive in Perl 6. :-)

2006.08.21

Pugs meets Judy

In 2006-08-14 I started the Judy integration inside Pugs' trunk. This meant building both Judy and HsJudy in the default Pugs make command and using it for both IArray and IHash types (which represent Array and Hash perl types) and also for interning (convert strings/identifiers to words that have fast equality comparison). Lots of different build errors happened, but after a couple of days and help from the folks at #perl6, the build process became stable.

The most expected feature was providing a good support for sparse arrays, which was chosen by the Perl guys to be the default way to deal with arrays (cf. arrays as elements in contiguous space of memory, easy to navigate and access), and Judy solved this problem.

Some numbers comparing r12203 (pre-Judy) with r12215 (after Judy and some corrections to make it work correctly). I noticed that people in the channel usually get better numbers than I do, but anyway, using GHC 6.4.2, I got:

  • ./pugs -C Parse-YAML src/perl6/Prelude.pm: uses the interning code, memory usage drops 15% and is 3% faster here.
  • t/closure_traits/first.t was one of tests mentioned in channel. Here Judy version is 9% faster considering runtime (ignoring parsing time, which is the same for both).
  • t/builtins/lists/minmax.t is little more than 10% faster for Judy (runtime measure too).

In feather, a P4 using GHC 6.4.1, now comparing r12203 against trunk, I got:

  • ./pugs -C Parse-YAML src/perl6/Prelude.pm: uses the variable identifier interning code, 20% faster in trunk.
  • t/closure_traits/first.t maintains around 10%.

Also, GHC 6.5 in general seems to be giving better performance: in my machine (an amd64 3200+) mandel.pl with Pugs on 6.5 runs in around 2m2s, with Pugs on 6.4.2 around 7m40s. For src/perl6/Prelude.pm its 4s versus 7s aprox. Since numbers are so different than feather it might mean there's something wrong with my GHC 6.4.2 binary...

Right now, only Judy.StrMap and Judy.IntMap are being used since we got some random segfaults with Judy.Hash. Changing between Judy.StrMap, Judy.Hash, Data.Map (with IORef help) and Data.HashTable now is very easy, since all of them instantiate MapM type-class.

HsJudy is my Summer of Code project. SoC ends today (thank you Google, Haskell and Perl folks) but the work with Pugs goes on, see you in #perl6... :)

2006.08.12

Val migration

The primary difficulty in moving over to the new internals are that it's hard to make a huge all-encompassing change in a project. We're not just changing the internal representation of strings from one form to another. Encapsulation and type inference would make that task certainly manageable. The present change affects virtually all Perl types, obsoletes the widely-used VRef, and changes the way AST expressions are strung together.

Another problem is that the new AST is much more detailed than the old one, and putting all its definitions in one file makes it hard to maintain/understand all at once. Also, although this AST definition should be shared by all Perl 6 implementations, the native value representations are backend-specific.

Armed with techniques from this paper, we can address both issues at the same time. Instead of a closed datatype for Val (starting from values instead of expressions makes migration easier), we make use of typeclasses where appropriate. A new (:>:) class lets us stipulate subset relationships cleanly, so a single cast method convers (old) Strings to (new) ByteStrings, but also a value to its Id when in Id context.

The work is going along nicely. We've added a VV node type in the old AST, and already we can construct newstyle values from pugs by casting them with the interim vv prim. This helps to keep us honest, as our code comes under constant scrutiny of the type checker and actually gets to be executed.

    pugs> vv "Moose"
    VPure (MkStr "Moose")      # look, not VStr!
    pugs> vv 42
    VPure (IFinite 42)
    pugs> vv 3.1415
    VPure (NRational (6283%2000))
    pugs> vv "22"/7
    VPure (NDouble 3.142857142857143)

2006.07.24

Hs -> P6

When I was giving a talk at YAPC, someone asked me if a Haskell data type is like a blessed object in Perl. I said that they were profoundly different. Well, that was probably too offputting: a Haskell datatype can be represented as a Perl 6 Role, and its variants as different Classes.

To facilitate the Perl 6-on-Perl 6 efforts, Audrey asked me to write a general automatic DrIFT-based Hs->P6 converter. The first use is to put the new AST scheme in a language where data can be more readily manipulated. For example, here is a function parameter in both Haskell and Perl 6:

-- Haskell
data Param = MkParam
    { p_variable    :: Ident
    , p_types       :: [Class]
    , p_constraints :: [Code]
    , p_unpacking   :: Maybe Sig
    , p_default     :: Maybe Exp
    , p_label       :: Ident
    , p_slots       :: Table
    , p_hasAccess   :: ParamAccess
    , p_isRef       :: Bool
    , p_isLazy      :: Bool
    }

# Perl 6
role Param is TaggedUnion;

class MkParam does Param {
    has Ident $.p_variable;
    has Class @.p_types;
    has Code @.p_constraints;
    has Sig $.p_unpacking;
    has Exp $.p_default;
    has Ident $.p_label;
    has Table $.p_slots;
    has ParamAccess $.p_hasAccess;
    has Bool $.p_isRef;
    has Bool $.p_isLazy;
};

Here is the latest AST translation (and where that came from). This is a step towards assuring that Pugs' Perl 6 AST can be shared by other implementations, and (for example) macros that manipulate parsed bits of Perl can work on more than one compiler.

So far this was much easier to do than our YAML class (that bridges why++'s libsyck with Haskell for any data). But the next step is figuring out how to dump actual arbitrary Perl 6 values...

2006.06.21

STM: retry and retry_with works!

The retry and retry_with (the latter is known as orElse in Haskell) support has just landed; here is an example scripts that shows how to call them.

Much thanks to Liz for coming up with the name and usage example as part of the Concurrency Spec draft. Oh, and once Charles's SoC project works, we'd be able to port that to Parrot, too. Yay! :-)

2006.05.30

GHC-MacIntel port.

MacBooks are not available in .tw yet, so $boss from $job brought me one from .hk. It easily paid for itself in time saved for the first day -- from-scratch Pugs build now takes 10 minutes; "make smoke" 20 minutes; and "make ghci" less than 20 seconds.

Because GHC 6.4.x lacks support for MacIntel, I've built on Wolfgang's previous work and made a binary distribution of GHC 6.5-HEAD. Thanks to testing from obra++, this build should Just Work, without the need to manually fiddle with readline/GMP/cabal/etc.

Incremental development of Haskell code almost feels the same as Perl code now. Wow. :-)

2006.05.16

Linspire's OS team standardizes on Haskell.

Interesting announcement on the debian-haskell list:

The OS team at Linspire, Inc. would like to announce that we are standardizing on Haskell as our preferred language for core OS development.

We are redoing a bunch of our infrastructure using Haskell as our common  standard language. Our first task is redoing our Debian package builder (aka autobuilder) in Haskell.  Other tools such as ISO builders, package dependency checkers are in progress. The goal is to make a really tight simple set of tools that will let developers contribute to Freespire, based on Debian tools whenever possible. Our hardware detector, currently in OCaml, is on the block to be rewritten as well.

I've been planning to get a black MacBook this week and put Ubuntu on it; it'd be interesting to see if I can get Freespire tools running on that some day...

2006.05.02

HsSyck Cabalized.

One-sentence summary: Pugs's Syck binding recently ported to use Data.ByteString, is now available for all

Like the venerable MakeMaker, the Haskell Cabal build system greatly simplifies the task of writing distributable Haskell library.  However, just like early version for MakeMaker, it is a pain to write a portable Pugs.cabal across 1.0 (shipped with GHC 6.4 and 6.4.1), 1.1.3 (shipped with some distros), and 1.1.4 (shipped with GHC 6.4.2).

One particularly annoying misfeature of Cabal 1.0 is that it does not support multiple location of source files; this was responsible for the accumulated third-party code under the src/ tree. However, that also created numerous build problems, such as the dreadful "Syck_stub.o not found".

In Israel, we worked around these issues by forcing a rebuild of our Syck binding (Data.Yaml.Syck) on every "make", and then massage the resulting object file back to libHSPugs.a by manually invoking ar.  It's all in all very fragile.

Motivated by the licensing-cleanup effort of moving all non-Pugs modules from src/ to third-party/, and dakkar++'s report of a new crop of build failures with Cabal 1.1.3, I refactored out Syck into its own subdirectory, built automatically as part of "perl Makefile.PL". (It may move to "make" time in the future.  Not sure...)

ohis not only sped up build time (by 20 seconds here) and removed a primary fragile spot from our build system, but also let other lambdafolks write YAML bindings with much more ease. Yay! :-)