Check-in [2f51c8d73b]
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to tclconference@googlegroups.com
or submit via the online form by Sep 9.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:More about code motion
Timelines: family | ancestors | descendants | both | kbk-refactor-callframe
Files: files | file ages | folders
SHA3-256:2f51c8d73bce7855797f67aa11f7a5dce43a225d74457bcb89b6ebb115ce894e
User & Date: kbk 2019-02-25 04:41:47
Context
2019-03-26
20:45
Oops - didn't commit images for callframe.md! Leaf check-in: a19796b4e5 user: kbk tags: kbk-refactor-callframe
2019-02-25
04:41
More about code motion check-in: 2f51c8d73b user: kbk tags: kbk-refactor-callframe
2019-02-18
04:25
Write up first couple of tasks check-in: 643eefa7e6 user: kbk tags: kbk-refactor-callframe
Changes

Changes to doc/20190216callframe/callframe.md.

   236    236       the instruction, and all other names that potentially alias the
   237    237   	given name lose their values.
   238    238   	
   239    239   + `invoke[Expanded]` - The available values in the source callframe
   240    240     become available in the destination callframe.  Then, any names that the
   241    241     invoked command might change and any names that might alias them,
   242    242     lose their available values.
          243  +  
          244  ++ `extractCallframe` - The available values in the source callframe
          245  +  become available in the destination callframe.
   243    246   
   244    247   + `directSet` etc. - Any name that might be an alias of the changing
   245    248     variable loses its available value. (If the changing variable name
   246    249     cannot be determined, all names might be aliases, because in this
   247    250     case we cannot determine that the name is fully qualified.)
   248    251     
   249    252   + `variable`, `upvar`, `nsupvar` - Any existing value for the given
................................................................................
   276    279   block's new available set. The calculation of this set may in turn
   277    280   change the available values on φ instructions at the start of loops,
   278    281   so the calculation must be applied to convergence over all blocks in
   279    282   the program.
   280    283   
   281    284   Those who are versed in formal data flow analysis are welcome to
   282    285   separate the descriptions of the operations given above into separate
   283         -sets `AVAIL_GEN` and `AVAIL_KILL` so that the problem can be given as
          286  +sets `AVAIL\_GEN` and `AVAIL\_KILL` so that the problem can be given as
   284    287   the forward analysis:
   285    288   
   286    289   ![\texttt{AVAIL\_OUT}[B] = \texttt{AVAIL\_GEN}[B] 
   287    290   ~ \cup ~ \left( 
   288    291   \overline{\texttt{AVAIL\_KILL}[B]} 
   289    292   ~ \cap \bigcap_{P\in \texttt{pred}[B]} \texttt{AVAIL\_OUT}[P] 
   290    293   \right) 
................................................................................
   358    361     input callframe. We start with the set required relative to the
   359    362     output callframe. All variables that may be read or written, and
   360    363     all possible aliases, are added to the set. (Since we do not in
   361    364     general examine whether a procedure must alter a given variable,
   362    365     but only whether it may alter it, we require in general that the
   363    366     previous value of the variable must already be in sync.)
   364    367     
          368  ++ `extractCallframe` - The variables required in the source callframe
          369  +  are precisely those anticipated in the destination callframe.
          370  +
   365    371   + `directGet` etc. - The variables required will be relative to the
   366    372     input callframe. We start with the set required relative to the
   367    373     output callframe. Any variable that might alias the variable being
   368    374     retrieved or set in the instruction is added to the set.
   369    375     
   370    376   + `variable`, `upvar`, `nsupvar` - The variables required will be
   371    377     relative to the input callframe. We start with the set required
................................................................................
   388    394   designates a variable that is not live in the given callframe may be
   389    395   safely removed. Either it is a dead STORE (that is, neither it nor
   390    396   any potential alias will ever be accessed again), or it participates
   391    397   in a STORE-STORE pair. 
   392    398   
   393    399   ## TASK 3: Anticipability analysis and loop-invariant loads
   394    400   
          401  +With what has been developed in Tasks 1 and 2, we can get rid of
          402  +locally redundant `moveToCallFrame` and `moveFromCallFrame`
          403  +instructions. What remains is to attack loop-invariant ones: that is,
          404  +`moveFromCallFrame` that can be safely moved above a loop or
          405  +`moveToCallFrame` that can be safely be moved below. We use a plan of
          406  +attack based loosely on \[DREC93\].
          407  +
          408  +### Anticipability
          409  +
          410  +We already have the concept of _availability,_ developed in
          411  +Task 1. Let us introduce some notation for available values.
          412  +`AVAIL_OUT[B]` will be notation for the set of ordered pairs
          413  +(_cf_. _name_) available in SSA registers on exit from block `B` of
          414  +the program. (In the pairs, _cf_ names an SSA variable designating a
          415  +callframe, and _name_ is the name of a variable in that frame.
          416  +Similarly, `AVAIL_IN[B]`will be the set of (_cf_, _name_) pairs
          417  +available on entry to block `B`, and we know that
          418  +
          419  +![\texttt{AVAIL\_IN}[B] =  
          420  +\bigcap_{P\in \texttt{pred}[B]} \texttt{AVAIL\_OUT}[P] 
          421  +\right)](./availin.svg)
          422  +
          423  +(This discussion glosses over the fact that the callframe reference
          424  +must be translated every time that it passes through a φ instruction.
          425  +The translation is straightforward - for the intersection above, the
          426  +available value will be the output of the phi; for the anticipability
          427  +analysis below, the anticipable value will be the input of the phi on
          428  +the code path being analyzed.)
          429  +
          430  +Availability by itself is not sufficient for loop-invariant code
          431  +motion. In addition, we need the concept of _anticipability_. A (_cf_,
          432  +_name_) pair is _anticipable_ at a given point in the program if every
          433  +execution path forward from that point contains a `moveFromCallFrame`
          434  +retrieving that value prior to the value's being modified.
          435  +
          436  +Calculating anticipability requires multiple traversals of the program
          437  +in postdominator order. Nothing is anticipable on 'return'. For blocks
          438  +that do not return, we begin by taking the intersection of values
          439  +anticipable at their successors:
          440  +
          441  +![\texttt{ANTIC\_OUT}[B] =  
          442  +\bigcap_{P\in \texttt{succ}[B]} \texttt{ANTIC\_IN}[P] 
          443  +\right)](./anticout.svg)
          444  +
          445  +(Note that in computing this function, we translate the callframe
          446  +reference if it appears in a φ operation at the start of the block.)
          447  +
          448  +and then traverse the instructions of the block in reverse order.
          449  +
          450  ++ `moveFromCallFrame` - The given name and callframe reference become
          451  +  anticipable.
          452  +
          453  ++ `moveToCallFrame` - The given name and callframe reference, if they
          454  +  are anticipable, are removed from the anticipable set, along with
          455  +  any names that might potentially be aliases of the given name .This
          456  +  becomes the anticipable set for the input callframe. The output
          457  +  callframe is no longer relevant and may be forgotten.
          458  +  
          459  ++ `invoke[Expanded]` - Any variables potentially modified by the
          460  +  invoked command, and any potential aliases, are removed from the
          461  +  anticipable set, and the new set becomes the anticipable set for the
          462  +  input callframe. The output callframe is no longer relevant and may
          463  +  be forgotten.
          464  +
          465  ++ `extractCallFrame` - The store operations anticipated in the source
          466  +  callframe are precisely those anticipated in the destination
          467  +  callframe, and the destination callframe is no longer relevant.
          468  +  
          469  ++ `directGet` - Since (as stated above) we ignore the possibility that
          470  +  a read trace on one variable may modify another, we simply copy all
          471  +  anticipable references.
          472  +  
          473  ++ `directSet`, etc. - Any local variable that might alias the
          474  +  variable being modified is no longer anticipable, and once again
          475  +  the callframes are swapped as with `moveToCallFrame` and `invoke`.
          476  +  
          477  ++ `variable`, `upvar`, `nsupvar` - The local variable and any aliases
          478  +  are no longer anticipable.
          479  +  
          480  ++ φ - Analysis stops at a φ instruction that references a callframe.
          481  +  When analyzing a predecessor block, `ANTIC_IN` for its successors
          482  +  will be translated to refer to the callframe that is input to the φ.
          483  +
          484  +If a block has no φ instruction for the callframe, the same callframe
          485  +will be used when constructing its `ANTIC_IN` when analyzing a predecessor.
          486  +
          487  +### Placement of load instructions
          488  +
          489  +Once the `AVAIL_OUT` and `ANTIC_IN` sets are constructed, we can
          490  +determine where to place additional `moveFromCallFrame` instructions
          491  +that will perform loop-independent code motion. 
          492  +
          493  +Since the entry block of a loop is always a merge point in the control
          494  +flow, its predecessor blocks will be single-exit blocks owing to
          495  +critical edge splitting. We need consider only insertion of
          496  +`moveFromCallFrame` instructions at these points.
          497  +
          498  +The general plan is sketched in Section 4.3.2 of \[VanD04\]. We
          499  +examine merge points in the program (blocks with more than one
          500  +predecessor), in dominator order. For each (_cf_, _name_) that is 
          501  +anticipable at the entry to the block, we check availability in the
          502  +predecessor blocks. If the expression is available in at least one
          503  +predecessor, but not all predecessors, we insert `moveFromCallFrame`
          504  +instructions for it in the predecessors where it is unavailable. 
          505  +
          506  +This is another analysis that must be iterated to
          507  +convergence. Inserting these 'moveFromCallFrame' operations produces
          508  +new available variables, which can in turn expose further
          509  +opportunities for code motion. \[VAND04\] discusses this in more
          510  +detail in chapter 4.
          511  +
          512  +### Possible combination of passes
          513  +
          514  +This analysis closely mirrors what happens in partial redundancy
          515  +elimination (`quadcode/pre.tcl`). It may be possible to combine the
          516  +two into a single pass. The principal change will be that availability
          517  +analysis will need some refactoring, to discipline the size of the
          518  +`AVAIL` sets. Unlike the description in \[SIMP96\], we do not need to
          519  +shrink any of these sets for safety, because every time we perform an
          520  +operation that might affect a value in the callframe, we assign a new
          521  +SSA value to the callframe. Nevertheless, the quadcode sequence is
          522  +structured so that only one callframe is in flight at any given
          523  +time. Even if procedure inlining gives rise to additional callframes,
          524  +we still have the weaker condition that if there is an instruction
          525  +like
   395    526   
          527  +`doSomething cfout cfin moreArgs`
   396    528   
          529  +the input callframe will never again be used.
          530  +  
   397    531   ## TASK 4: Loop-invariant stores
   398    532   
   399    533   
   400    534   
   401    535   ## References
   402    536   
   403         -\[SIMP96\]. 
   404         -Loren Taylor Simpson. 1996. Value-Driven Redundancy
   405         -Elimination. Ph.D. Dissertation. Rice University, Houston, TX,
   406         -USA. AAI9631092. [<https://www.clear.rice.edu/comp512/Lectures/Papers/SimpsonThesis.pdf>]
          537  +\[DREC93\]
          538  +Karl-Heinz Drechsler and Manfred P. Stadel. "A variation of Knoop,
          539  +Rüthing, and Steffen's _Lazy Code Motion_". _ACM SIGPLAN Notices_ 28:5
          540  +(May, 1993), pp. 29-38.
          541  +[<https://dl.acm.org/citation.cfm?id=152823>]
          542  +
          543  +\[SIMP96\]
          544  +Loren Taylor Simpson. 1996. "Value-Driven Redundancy
          545  +Elimination". Ph.D. Dissertation. Rice University, Houston, TX,
          546  +USA. AAI9631092. 
          547  +[<https://www.clear.rice.edu/comp512/Lectures/Papers/SimpsonThesis.pdf>]
          548  +
          549  +\[VanD04\]
          550  +VanDrunen, Thomas J. "Partial redundancy elimination for global value
          551  +numbering." PhD thesis, Purdue University, West Lafayette, Indiana
          552  +(August, 2004)
          553  +[<ftp://ftp.cs.purdue.edu/pub/hosking/papers/vandrunen.pdf>]