Check-in [2f51c8d73b]
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
or submit via the online form by Sep 9.

Overview
Comment: More about code motion family | ancestors | descendants | both | kbk-refactor-callframe files | file ages | folders 2f51c8d73bce7855797f67aa11f7a5dce43a225d74457bcb89b6ebb115ce894e kbk 2019-02-25 04:41:47
Context
 2019-03-26 20:45 Oops - didn't commit images for callframe.md! Leaf check-in: a19796b4e5 user: kbk tags: kbk-refactor-callframe 2019-02-25 04:41 More about code motion check-in: 2f51c8d73b user: kbk tags: kbk-refactor-callframe 2019-02-18 04:25 Write up first couple of tasks check-in: 643eefa7e6 user: kbk tags: kbk-refactor-callframe
Changes

Changes to doc/20190216callframe/callframe.md.

   236    236       the instruction, and all other names that potentially alias the
237    237   	given name lose their values.
238    238
239    239   + invoke[Expanded] - The available values in the source callframe
240    240     become available in the destination callframe.  Then, any names that the
241    241     invoked command might change and any names that might alias them,
242    242     lose their available values.
243  +
244  ++ extractCallframe - The available values in the source callframe
245  +  become available in the destination callframe.
243    246
244    247   + directSet etc. - Any name that might be an alias of the changing
245    248     variable loses its available value. (If the changing variable name
246    249     cannot be determined, all names might be aliases, because in this
247    250     case we cannot determine that the name is fully qualified.)
248    251
249    252   + variable, upvar, nsupvar - Any existing value for the given
................................................................................
276    279   block's new available set. The calculation of this set may in turn
277    280   change the available values on φ instructions at the start of loops,
278    281   so the calculation must be applied to convergence over all blocks in
279    282   the program.
280    283
281    284   Those who are versed in formal data flow analysis are welcome to
282    285   separate the descriptions of the operations given above into separate
283         -sets AVAIL_GEN and AVAIL_KILL so that the problem can be given as
286  +sets AVAIL\_GEN and AVAIL\_KILL so that the problem can be given as
284    287   the forward analysis:
285    288
286    289   ![\texttt{AVAIL\_OUT}[B] = \texttt{AVAIL\_GEN}[B]
287    290   ~ \cup ~ \left(
288    291   \overline{\texttt{AVAIL\_KILL}[B]}
289    292   ~ \cap \bigcap_{P\in \texttt{pred}[B]} \texttt{AVAIL\_OUT}[P]
290    293   \right)
................................................................................
358    361     input callframe. We start with the set required relative to the
359    362     output callframe. All variables that may be read or written, and
360    363     all possible aliases, are added to the set. (Since we do not in
361    364     general examine whether a procedure must alter a given variable,
362    365     but only whether it may alter it, we require in general that the
363    366     previous value of the variable must already be in sync.)
364    367
368  ++ extractCallframe - The variables required in the source callframe
369  +  are precisely those anticipated in the destination callframe.
370  +
365    371   + directGet etc. - The variables required will be relative to the
366    372     input callframe. We start with the set required relative to the
367    373     output callframe. Any variable that might alias the variable being
368    374     retrieved or set in the instruction is added to the set.
369    375
370    376   + variable, upvar, nsupvar - The variables required will be
371    377     relative to the input callframe. We start with the set required
................................................................................
388    394   designates a variable that is not live in the given callframe may be
389    395   safely removed. Either it is a dead STORE (that is, neither it nor
390    396   any potential alias will ever be accessed again), or it participates
391    397   in a STORE-STORE pair.
392    398
394    400
401  +With what has been developed in Tasks 1 and 2, we can get rid of
402  +locally redundant moveToCallFrame and moveFromCallFrame
403  +instructions. What remains is to attack loop-invariant ones: that is,
404  +moveFromCallFrame that can be safely moved above a loop or
405  +moveToCallFrame that can be safely be moved below. We use a plan of
406  +attack based loosely on $DREC93$.
407  +
408  +### Anticipability
409  +
410  +We already have the concept of _availability,_ developed in
411  +Task 1. Let us introduce some notation for available values.
412  +AVAIL_OUT[B] will be notation for the set of ordered pairs
413  +(_cf_. _name_) available in SSA registers on exit from block B of
414  +the program. (In the pairs, _cf_ names an SSA variable designating a
415  +callframe, and _name_ is the name of a variable in that frame.
416  +Similarly, AVAIL_IN[B]will be the set of (_cf_, _name_) pairs
417  +available on entry to block B, and we know that
418  +
419  +![\texttt{AVAIL\_IN}[B] =
420  +\bigcap_{P\in \texttt{pred}[B]} \texttt{AVAIL\_OUT}[P]
421  +\right)](./availin.svg)
422  +
423  +(This discussion glosses over the fact that the callframe reference
424  +must be translated every time that it passes through a φ instruction.
425  +The translation is straightforward - for the intersection above, the
426  +available value will be the output of the phi; for the anticipability
427  +analysis below, the anticipable value will be the input of the phi on
428  +the code path being analyzed.)
429  +
430  +Availability by itself is not sufficient for loop-invariant code
431  +motion. In addition, we need the concept of _anticipability_. A (_cf_,
432  +_name_) pair is _anticipable_ at a given point in the program if every
433  +execution path forward from that point contains a moveFromCallFrame
434  +retrieving that value prior to the value's being modified.
435  +
436  +Calculating anticipability requires multiple traversals of the program
437  +in postdominator order. Nothing is anticipable on 'return'. For blocks
438  +that do not return, we begin by taking the intersection of values
439  +anticipable at their successors:
440  +
441  +![\texttt{ANTIC\_OUT}[B] =
442  +\bigcap_{P\in \texttt{succ}[B]} \texttt{ANTIC\_IN}[P]
443  +\right)](./anticout.svg)
444  +
445  +(Note that in computing this function, we translate the callframe
446  +reference if it appears in a φ operation at the start of the block.)
447  +
448  +and then traverse the instructions of the block in reverse order.
449  +
450  ++ moveFromCallFrame - The given name and callframe reference become
451  +  anticipable.
452  +
453  ++ moveToCallFrame - The given name and callframe reference, if they
454  +  are anticipable, are removed from the anticipable set, along with
455  +  any names that might potentially be aliases of the given name .This
456  +  becomes the anticipable set for the input callframe. The output
457  +  callframe is no longer relevant and may be forgotten.
458  +
459  ++ invoke[Expanded] - Any variables potentially modified by the
460  +  invoked command, and any potential aliases, are removed from the
461  +  anticipable set, and the new set becomes the anticipable set for the
462  +  input callframe. The output callframe is no longer relevant and may
463  +  be forgotten.
464  +
465  ++ extractCallFrame - The store operations anticipated in the source
466  +  callframe are precisely those anticipated in the destination
467  +  callframe, and the destination callframe is no longer relevant.
468  +
469  ++ directGet - Since (as stated above) we ignore the possibility that
470  +  a read trace on one variable may modify another, we simply copy all
471  +  anticipable references.
472  +
473  ++ directSet, etc. - Any local variable that might alias the
474  +  variable being modified is no longer anticipable, and once again
475  +  the callframes are swapped as with moveToCallFrame and invoke.
476  +
477  ++ variable, upvar, nsupvar - The local variable and any aliases
478  +  are no longer anticipable.
479  +
480  ++ φ - Analysis stops at a φ instruction that references a callframe.
481  +  When analyzing a predecessor block, ANTIC_IN for its successors
482  +  will be translated to refer to the callframe that is input to the φ.
483  +
484  +If a block has no φ instruction for the callframe, the same callframe
485  +will be used when constructing its ANTIC_IN when analyzing a predecessor.
486  +
487  +### Placement of load instructions
488  +
489  +Once the AVAIL_OUT and ANTIC_IN sets are constructed, we can
490  +determine where to place additional moveFromCallFrame instructions
491  +that will perform loop-independent code motion.
492  +
493  +Since the entry block of a loop is always a merge point in the control
494  +flow, its predecessor blocks will be single-exit blocks owing to
495  +critical edge splitting. We need consider only insertion of
496  +moveFromCallFrame instructions at these points.
497  +
498  +The general plan is sketched in Section 4.3.2 of $VanD04$. We
499  +examine merge points in the program (blocks with more than one
500  +predecessor), in dominator order. For each (_cf_, _name_) that is
501  +anticipable at the entry to the block, we check availability in the
502  +predecessor blocks. If the expression is available in at least one
503  +predecessor, but not all predecessors, we insert moveFromCallFrame
504  +instructions for it in the predecessors where it is unavailable.
505  +
506  +This is another analysis that must be iterated to
507  +convergence. Inserting these 'moveFromCallFrame' operations produces
508  +new available variables, which can in turn expose further
509  +opportunities for code motion. $VAND04$ discusses this in more
510  +detail in chapter 4.
511  +
512  +### Possible combination of passes
513  +
514  +This analysis closely mirrors what happens in partial redundancy
515  +elimination (quadcode/pre.tcl). It may be possible to combine the
516  +two into a single pass. The principal change will be that availability
517  +analysis will need some refactoring, to discipline the size of the
518  +AVAIL sets. Unlike the description in $SIMP96$, we do not need to
519  +shrink any of these sets for safety, because every time we perform an
520  +operation that might affect a value in the callframe, we assign a new
521  +SSA value to the callframe. Nevertheless, the quadcode sequence is
522  +structured so that only one callframe is in flight at any given
523  +time. Even if procedure inlining gives rise to additional callframes,
524  +we still have the weaker condition that if there is an instruction
525  +like
395    526
527  +doSomething cfout cfin moreArgs
396    528
529  +the input callframe will never again be used.
530  +
397    531   ## TASK 4: Loop-invariant stores
398    532
399    533
400    534
401    535   ## References
402    536
403         -$SIMP96$.
404         -Loren Taylor Simpson. 1996. Value-Driven Redundancy
405         -Elimination. Ph.D. Dissertation. Rice University, Houston, TX,
406         -USA. AAI9631092. [<https://www.clear.rice.edu/comp512/Lectures/Papers/SimpsonThesis.pdf>]
537  +$DREC93$
538  +Karl-Heinz Drechsler and Manfred P. Stadel. "A variation of Knoop,
539  +Rüthing, and Steffen's _Lazy Code Motion_". _ACM SIGPLAN Notices_ 28:5
540  +(May, 1993), pp. 29-38.
541  +[<https://dl.acm.org/citation.cfm?id=152823>]
542  +
543  +$SIMP96$
544  +Loren Taylor Simpson. 1996. "Value-Driven Redundancy
545  +Elimination". Ph.D. Dissertation. Rice University, Houston, TX,
546  +USA. AAI9631092.
547  +[<https://www.clear.rice.edu/comp512/Lectures/Papers/SimpsonThesis.pdf>]
548  +
549  +$VanD04$
550  +VanDrunen, Thomas J. "Partial redundancy elimination for global value
551  +numbering." PhD thesis, Purdue University, West Lafayette, Indiana
552  +(August, 2004)
553  +[<ftp://ftp.cs.purdue.edu/pub/hosking/papers/vandrunen.pdf>]