This is intended to review some notes and ideas for cleaning up lexical handling in Parrot; the current lexicals implementation continues to cause problems for PCT and Rakudo (RT #56512, #56398, #58854, #58392, and possibly others). The discussion below is based on a reading of PDD20, Synopsis 4 (especially the section "When is a closure not a closure"), the very long discussion on lexicals in RT #56398, and personal experiences in dealing with Parrot's existing implementation. This document isn't intended to analyze the ways in which the current implementation is broken, but rather to propose a new much simpler design based on my interpretation of the discussions and information in the above sources. There are two or three key places where this design differs from the existing implementation: 1. The design described here does not make use of (nor want) a separate Closure PMC type -- the ability to handle lexicals is built directly into the Sub PMC base class (and thus is directly available to all derived classes of Sub). 2. The functionality of the existing 'newclosure' opcode is split into separate "capture lexical bindings" and "clone" operations. See below for how "newclosure" can be preserved if needed. 3. The "autoclose" semantics in the current implementation are greatly simplified, and no longer try to make use of the caller context chain. With the above in mind, here are my thoughts on a lexicals implementation. I started out with a rewrite of pdd20, but I've since decided that focusing on the core details is best for discussion, and after those are nailed down we can rework it into a PDD. ----- First, a brief review of some key lexical features and terms. In PIR, the ".lex" directive is used to declare a lexical variable within a Parrot Sub. A flag of ":outer('foo')" indicates that the Sub with a :lexid of 'foo' is the lexically enclosing outer scope of the Sub being defined. Together, these PIR constructs specify the static, compile-time lexical features of Parrot subs. (For purposes of this draft I'm ignoring the :lex flag on Parrot subs; it shouldn't materially affect any of what follows.) Internally, these static features are held as attributes of a Sub PMC. The C attribute references a LexInfo PMC that contains the lexical variables defined within the Sub (via ".lex"). The C attribute points to the Sub identified by the ":outer" flag -- i.e., the current Sub's outer scope. Again, these are the static features of Subs using lexicals -- once compiled, they seldom change. Runtime part 1: LexPads and lexicals defined within a Sub When a Sub is invoked, a new Parrot_context is allocated to act as a "call frame" for the new invocation. If the Sub has a LexInfo PMC in its C attribute, then a new LexPad PMC is created from the LexInfo PMC and stored in the context's C attribute. Lexical fetch and store operations are initiated by looking at the context's C attribute. Runtime part 2: Outer lexicals and contexts Handling outer lexicals requires a bit more work, because we want to make sure that when we invoke or take a reference to a Sub PMC that it properly captures its outer lexical environment. What follows is based on Synopsis 4 and the analysis in RT #56398. In this description we will use "outer sub" to mean the Sub PMC that is the outer scope (:outer target) of one or more "inner sub" PMCs. The three main pieces to the puzzle are: 1. capture: a "capture_lex" operation binds lexical references in an inner Sub PMC to the current lexical context. 2. invoke: each call frame (context) contains a link to its outer lexical contexts (note that in many cases this can be different from the link to the caller's context) 3. lookup: semantics for accessing outer lexicals via find_lex and store_lex For #1, we introduce a new "capture_lex" opcode. The purpose of capture_lex is to bind an (inner) Sub PMC to the lexical environment given by the current context (call frame). For immediate block invocations and anonymous block references the capture_lex occurs immediately before the call or reference. For named blocks the capture_lex is generated as soon as the outer sub (lexical scope) is entered, in case something takes a reference to the inner named block. .sub 'foo' :lex ## capture_lex on all named subs that list 'foo' as outer .const 'Sub' $P0 = 'bar' $P0 = find_name_not_null 'bar' capture_lex $P0 ... ## later, invoke 'bar' -- 'bar' already has proper outer context 'bar'() .end .sub 'bar' :outer('foo') ... .end Internally, capture_lex is quite simple: (1) check that the C attribute of the current call frame matches the C attribute of the target Sub PMC, and (2) if they match, set the C attribute of the target Sub PMC to the current context. (See note below for handling MultiSub PMCs.) Next, when a Sub is invoked we want it to make its outer lexical environment available to any inner subs that it invokes (#2 above). To do this, when the new Parrot_context is created for a Sub invocation, we set the C attribute of the context to the C attribute of the invoked Sub. The C attribute of the Sub PMC will have been set by the most recent capture_lex operation on the Sub, so that the newly created context will have references to the Sub's correct outer contexts. Finally, the semantics for accessing lexicals via find_lex and store_lex become quite simple: Starting with the current context, look for the desired lexical in the C attribute and return it if found, otherwise move to the context given by C and try again. (Note that there's no need to follow the "caller chain" given by C, as C already contains all of the correct contexts directly.) Runtime part 3: Closures and cloning Sometimes instead of invoking a Sub immediately, we want to treat it as a closure that may be invoked later with the lexical environment that was in effect at the time the closure was taken. Fortunately, with the above implementations the operation of "taking a closure" is equivalent to performing a clone operation on a Sub PMC. Consider the following Perl 6 code: sub foo { my $x = 1; sub bar { print $x; } return &bar; } In PIR this would then translate to something like: .sub 'foo' ## bind inner sub 'bar' to current lexical environment .const 'Sub' $P0 = 'bar' capture_lex $P0 ## my $x = 1 $P1 = new 'Int' $P1 = 1 .lex '$x', $P1 ## return &bar $P2 = get_global 'bar' ## clone 'bar', preserving current lexical environment $P2 = clone $P2 .return ($P2) .end .sub 'bar' :outer('foo') $P3 = find_lex '$x' print $P3 .end As before, since C is lexically nested in C, we perform a capture_lex operation on C at the beginning of C, binding the C attribute of bar's Sub PMC to foo's current lexical environment. Later in C when a reference is taken to C<&bar>, we clone the Sub PMC associated with C and return that to the caller. The cloned copy of C acts as a closure -- it retains the C attribute associated with the current invocation of C, even if subsequent calls to C change the C context used by C itself. The new "newclosure" As mentioned earlier, under this design there's really no longer a need for Parrot's "newclosure" opcode, as one can do the equivalent operation via a combination of "capture_lex" and "clone". However, for compatibility reasons it's probably worthwhile to keep "newclosure" around as a useful shortcut. Note that "newclosure" cannot always be used in place of "capture_lex", because sometimes we want to bind a Sub to a context without modifying or replacing the PMC (e.g., in MultiSubs). It's also more efficient to avoid the clone operation when we don't need it. Capture_lex on MultiSub PMCs The ":multi" flag in Parrot allows multiple subs to be stored as a single MultiSub PMC under a common name (and distinguished at function call dispatch by MMD signature). But we also need a way to perform capture_lex operations on lexically nested :multi subs; for now the solution is that a capture_lex operation on a MultiSub should perform capture_lex on each of its component Sub PMCs that have an C matching the C of the current context. A more robust solution can arise if/when we have a way to uniquely identify Sub PMCs by something other than their (shared) short name, but this is a reasonable solution for now. Recursion Here's an example illustrating that simple recursion works in this design. sub abc() { my $x = 1; sub fact($y) { return ($y > $x) ?? $y * fact($y - 1) !! 1; } say fact(5); } In PIR, this is roughly: .sub 'abc' ## set outer lexicals for fact .const .Sub fact = 'fact' capture_lex fact ## my $x = 1; .local pmc x x = new 'Integer' x = 1 .lex '$x', x ## say fact(5); $P0 = 'fact'(5) say $P0 .end .sub 'fact' :outer('abc') .param pmc y .lex '$y', y ## retrieve $x from outer scope .local pmc x x = find_lex '$x' ## ($y > $x) ?? $y * fact($y - 1) !! 1 if y > x goto recurse .return (1) recurse: $P0 = n_sub y, 1 $P1 = 'fact'($P0) $P2 = n_sub y, $P1 .return ($P2) .end The key thing to note is that even though 'fact' is recursive and thus creates a separate context (call frame) for each recursive invocation, the C attribute of the 'fact' Sub PMC remains the same -- i.e., the context created by its outer sub ('abc'). Since C of 'fact' remains the same throughout the recursion, each invocation of 'fact' will create a new context that has a new LexPad for fact's lexicals and a common C (containing abc's lexicals and a pointer to abc's outer contexts). Thus each recursive invocation of fact() ends up with its own lexical $y but shares the lexical $x created in the outer abc() scope. Autoclose In some instances it may be possible to invoke or take a closure on a Sub PMC that has not had a capture_lex operation performed on it. Here's an example from Perl 6: sub X() { my $a; sub Y($c) { $a = $c; } } Y(3); When Y(3) is invoked, the X sub has not yet been invoked so there's no call frame for storing $a. In this case we need to dynamically create a call frame for X when Y is invoked. Synopsis 4 describes how this is intended to work in Perl 6; here I'll describe the semantics for Parrot. Recall that in the "normal" case we would be invoking Y from within X, and in that situation Y would already have its C set by a capture_lex operation from the most recent invocation of X. The "autoclose" case described here arises when Y hasn't had a capture_lex performed on it by its outer sub, and thus its C is still null. In this case, when we invoke an inner sub Y that has a null C attribute, we first "autoclose" Y by setting its C attribute to the current C attribute of its outer sub X. In the case where X has never been invoked (and thus its C attribute is null), we first create a dummy context for X's C, populating that dummy context with a new LexPad and setting the C to X's C. (Open question: How can a HLL compiler specify how these dummy lexicals are to be initialized -- e.g., with constraints or initial values?) Of course, the C for X might also be null if it's an inner sub that hasn't had a capture_lex performed on it. In that case we first perform the same process for X that we're doing for Y -- set X's C to the C of its outer sub, creating a dummy context for that outer sub if needed. This creation of dummy contexts continues up the chain of outer scopes until we reach a scope with no :outer or a scope that already has its context set. Examples The remainder of this document can be used to demonstrate and explain how specific instances of HLL code translates into its PIR equivalent. Perl 6: sub outer() { my $a; my $f = { say $a; } } PIR: .sub 'outer' .lex '$a', $P0 .const 'Sub' $P1 = '_block1' newclosure $P1, $P1 .lex '$f', $P1 .end .sub '_block1' :anon :outer('outer') $P0 = find_lex '$a' say $P0 .end