This is intended to review some notes and ideas for cleaning up
lexical handling in Parrot; the current lexicals implementation 
continues to cause problems for PCT and Rakudo (RT #56512, #56398, 
#58854, #58392, and possibly others).

The discussion below is based on a reading of PDD20, Synopsis 4 
(especially the section "When is a closure not a closure"), the 
very long discussion on lexicals in RT #56398, and personal
experiences in dealing with Parrot's existing implementation.  

This document isn't intended to analyze the ways in which the
current implementation is broken, but rather to propose a new
much simpler design based on my interpretation of the discussions
and information in the above sources.

There are two or three key places where this design differs from
the existing implementation:

1. The design described here does not make use of (nor want) a 
   separate Closure PMC type -- the ability to handle lexicals 
   is built directly into the Sub PMC base class (and thus is
   directly available to all derived classes of Sub).
2. The functionality of the existing 'newclosure' opcode is
   split into separate "capture lexical bindings" and "clone"
   operations.  See below for how "newclosure" can be preserved
   if needed.
3. The "autoclose" semantics in the current implementation
   are greatly simplified, and no longer try to make use of
   the caller context chain.

With the above in mind, here are my thoughts on a lexicals
implementation.  I started out with a rewrite of pdd20, but I've
since decided that focusing on the core details is best for
discussion, and after those are nailed down we can rework it
into a PDD.

-----

First, a brief review of some key lexical features and terms.  

In PIR, the ".lex" directive is used to declare a lexical variable 
within a Parrot Sub.  A flag of ":outer('foo')" indicates that
the Sub with a :lexid of 'foo' is the lexically enclosing outer
scope of the Sub being defined.  Together, these PIR constructs
specify the static, compile-time lexical features of Parrot subs.

(For purposes of this draft I'm ignoring the :lex flag on Parrot subs;
it shouldn't materially affect any of what follows.)

Internally, these static features are held as attributes of a
Sub PMC.  The C<lex_info> attribute references a LexInfo PMC
that contains the lexical variables defined within the Sub (via ".lex").
The C<outer_sub> attribute points to the Sub identified by the
":outer" flag -- i.e., the current Sub's outer scope.

Again, these are the static features of Subs using lexicals -- once
compiled, they seldom change.

Runtime part 1:  LexPads and lexicals defined within a Sub

When a Sub is invoked, a new Parrot_context is allocated to act as
a "call frame" for the new invocation.  If the Sub has a LexInfo PMC
in its C<lex_info> attribute, then a new LexPad PMC is created from
the LexInfo PMC and stored in the context's C<lex_pad> attribute.
Lexical fetch and store operations are initiated by looking at
the context's C<lex_pad> attribute.

Runtime part 2:  Outer lexicals and contexts

Handling outer lexicals requires a bit more work, because we want
to make sure that when we invoke or take a reference to a Sub
PMC that it properly captures its outer lexical environment.
What follows is based on Synopsis 4 and the analysis in RT #56398.
In this description we will use "outer sub" to mean the Sub PMC
that is the outer scope (:outer target) of one or more "inner sub"
PMCs.

The three main pieces to the puzzle are:
1.  capture: a "capture_lex" operation binds lexical references in
    an inner Sub PMC to the current lexical context.
2.  invoke: each call frame (context) contains a link to its outer lexical
    contexts (note that in many cases this can be different from
    the link to the caller's context)
3.  lookup: semantics for accessing outer lexicals via find_lex and store_lex

For #1, we introduce a new "capture_lex" opcode.  The purpose of 
capture_lex is to bind an (inner) Sub PMC to the lexical environment 
given by the current context (call frame).  For immediate block
invocations and anonymous block references the capture_lex occurs
immediately before the call or reference.  For named blocks the
capture_lex is generated as soon as the outer sub (lexical scope)
is entered, in case something takes a reference to the inner named
block.

    .sub 'foo' :lex
        ##  capture_lex on all named subs that list 'foo' as outer
        .const 'Sub' $P0 = 'bar'
        $P0 = find_name_not_null 'bar'
        capture_lex $P0
        ...
        ##  later, invoke 'bar' -- 'bar' already has proper outer context
        'bar'()
    .end

    .sub 'bar' :outer('foo')
        ...
    .end

Internally, capture_lex is quite simple:  (1) check that the
C<current_sub> attribute of the current call frame matches the
C<outer_sub> attribute of the target Sub PMC, and (2) if they
match, set the C<outer_ctx> attribute of the target Sub PMC to
the current context.  (See note below for handling MultiSub PMCs.)

Next, when a Sub is invoked we want it to make its outer
lexical environment available to any inner subs that it
invokes (#2 above).  To do this, when the new Parrot_context
is created for a Sub invocation, we set the C<outer_ctx>
attribute of the context to the C<outer_ctx> attribute of
the invoked Sub.  The C<outer_ctx> attribute of the Sub PMC
will have been set by the most recent capture_lex operation 
on the Sub, so that the newly created context will have
references to the Sub's correct outer contexts.

Finally, the semantics for accessing lexicals via find_lex and
store_lex become quite simple:  Starting with the current context,
look for the desired lexical in the C<lex_pad> attribute and return
it if found, otherwise move to the context given by C<outer_ctx>
and try again.  (Note that there's no need to follow the "caller 
chain" given by C<caller_ctx>, as C<outer_ctx> already contains 
all of the correct contexts directly.)

Runtime part 3: Closures and cloning

Sometimes instead of invoking a Sub immediately, we want to
treat it as a closure that may be invoked later with the
lexical environment that was in effect at the time the
closure was taken.  Fortunately, with the above implementations
the operation of "taking a closure" is equivalent to performing
a clone operation on a Sub PMC.

Consider the following Perl 6 code:

    sub foo {
        my $x = 1;
        sub bar { print $x; }
        return &bar;
    }

In PIR this would then translate to something like:

    .sub 'foo'
        ##  bind inner sub 'bar' to current lexical environment
        .const 'Sub' $P0 = 'bar'
        capture_lex $P0

        ## my $x = 1
        $P1 = new 'Int'
        $P1 = 1
        .lex '$x', $P1

        ## return &bar
        $P2 = get_global 'bar'
        ## clone 'bar', preserving current lexical environment
        $P2 = clone $P2
        .return ($P2)
    .end

    .sub 'bar' :outer('foo')
        $P3 = find_lex '$x'
        print $P3
    .end

As before, since C<bar> is lexically nested in C<foo>, we perform
a capture_lex operation on C<bar> at the beginning of C<foo>,
binding the C<outer_ctx> attribute of bar's Sub PMC to foo's current
lexical environment.  Later in C<foo> when a reference is taken to
C<&bar>, we clone the Sub PMC associated with C<bar> and return that
to the caller.  The cloned copy of C<bar> acts as a closure --
it retains the C<outer_ctx> attribute associated with the
current invocation of C<foo>, even if subsequent calls to C<foo>
change the C<outer_ctx> context used by C<bar> itself.


The new "newclosure"

As mentioned earlier, under this design there's really no longer
a need for Parrot's "newclosure" opcode, as one can do the
equivalent operation via a combination of "capture_lex" and "clone".
However, for compatibility reasons it's probably worthwhile to
keep "newclosure" around as a useful shortcut.

Note that "newclosure" cannot always be used in place of "capture_lex",
because sometimes we want to bind a Sub to a context without
modifying or replacing the PMC (e.g., in MultiSubs).  It's also
more efficient to avoid the clone operation when we don't need it.


Capture_lex on MultiSub PMCs

The ":multi" flag in Parrot allows multiple subs to be stored as
a single MultiSub PMC under a common name (and distinguished at
function call dispatch by MMD signature).  But we also need a way
to perform capture_lex operations on lexically nested :multi subs;
for now the solution is that a capture_lex operation on a MultiSub 
should perform capture_lex on each of its component Sub PMCs
that have an C<outer_sub> matching the C<current_sub> of the
current context.  A more robust solution can arise if/when we
have a way to uniquely identify Sub PMCs by something other than
their (shared) short name, but this is a reasonable solution for now.


Recursion

Here's an example illustrating that simple recursion works in
this design.

    sub abc() {
        my $x = 1;
        sub fact($y) { return ($y > $x) ?? $y * fact($y - 1) !! 1; }
        say fact(5);
    }

In PIR, this is roughly:

    .sub 'abc'
        ##  set outer lexicals for fact
        .const .Sub fact = 'fact'
        capture_lex fact

        ##  my $x = 1;
        .local pmc x
        x = new 'Integer'
        x = 1
        .lex '$x', x

        ##  say fact(5);
        $P0 = 'fact'(5)
        say $P0
    .end

    .sub 'fact' :outer('abc')
        .param pmc y
        .lex '$y', y

        ##  retrieve $x from outer scope
        .local pmc x
        x = find_lex '$x'

        ##  ($y > $x) ?? $y * fact($y - 1) !! 1
        if y > x goto recurse
        .return (1)

      recurse:
        $P0 = n_sub y, 1
        $P1 = 'fact'($P0)
        $P2 = n_sub y, $P1
        .return ($P2)
    .end
       
The key thing to note is that even though 'fact' is recursive
and thus creates a separate context (call frame) for each
recursive invocation, the C<outer_ctx> attribute of the 'fact' 
Sub PMC remains the same -- i.e., the context created by
its outer sub ('abc').  Since C<outer_ctx> of 'fact' remains 
the same throughout the recursion, each invocation of 'fact'
will create a new context that has a new LexPad for fact's
lexicals and a common C<outer_ctx> (containing abc's lexicals
and a pointer to abc's outer contexts).  

Thus each recursive invocation of fact() ends up with its own 
lexical $y but shares the lexical $x created in the outer 
abc() scope.


Autoclose

In some instances it may be possible to invoke or take a
closure on a Sub PMC that has not had a capture_lex
operation performed on it.  Here's an example from Perl 6:

    sub X() {
       my $a;
       sub Y($c) { $a = $c; }
    }

    Y(3);

When Y(3) is invoked, the X sub has not yet been invoked
so there's no call frame for storing $a.  In this case we need to
dynamically create a call frame for X when Y is invoked.  
Synopsis 4 describes how this is intended to work in Perl 6; 
here I'll describe the semantics for Parrot.

Recall that in the "normal" case we would be invoking Y from within
X, and in that situation Y would already have its C<outer_ctx> set
by a capture_lex operation from the most recent invocation of X.
The "autoclose" case described here arises when Y hasn't had
a capture_lex performed on it by its outer sub, and thus its
C<outer_ctx> is still null.

In this case, when we invoke an inner sub Y that has a null
C<outer_ctx> attribute, we first "autoclose" Y by setting its 
C<outer_ctx> attribute to the current C<ctx> attribute of its
outer sub X.  In the case where X has never been invoked (and
thus its C<ctx> attribute is null), we first create a dummy
context for X's C<ctx>, populating that dummy context with a
new LexPad and setting the C<outer_ctx> to X's C<outer_ctx>.

(Open question:  How can a HLL compiler specify how
these dummy lexicals are to be initialized -- e.g.,
with constraints or initial values?)

Of course, the C<outer_ctx> for X might also be null if
it's an inner sub that hasn't had a capture_lex performed
on it.  In that case we first perform the same process for
X that we're doing for Y -- set X's C<outer_ctx> to the C<ctx>
of its outer sub, creating a dummy context for that outer sub
if needed.  This creation of dummy contexts continues up the
chain of outer scopes until we reach a scope with no :outer
or a scope that already has its context set.


Examples

The remainder of this document can be used to demonstrate and
explain how specific instances of HLL code translates into
its PIR equivalent.

Perl 6:
    sub outer() {
        my $a;
        my $f = { say $a; }
    }

PIR:
    .sub 'outer'
        .lex '$a', $P0
        .const 'Sub' $P1 = '_block1'
        newclosure $P1, $P1
        .lex '$f', $P1
    .end

    .sub '_block1' :anon :outer('outer')
        $P0 = find_lex '$a'
        say $P0
    .end