Subject:
Bytecode safety and exception ranges.
From:
Kevin Kenny <kkenny2-tUwO18uD7CBWk0Htik3J/w@public.gmane.org>
Date:
Fri, 29 Oct 2010 12:15:12 -0400
To:
Tcl Core List <tcl-core-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
CC:
Ozgur Dogan Ugurlu <dogeen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Newsgroups:
gmane.comp.lang.tcl.core

As most of you are aware, I've been working on bytecode safety analysis
for TAL. I started that project simply because it came as a side effect
of the fact that a Bytecode object has to assert the stack depth and
exception range depth that might appear inside it, but it now seems that
the greatest value of the project comes from the fact that the attempts
at safety analysis are exposing bytecode compilation bugs.

The latest nasty bug is number 3098302.  The offending code is fairly
simple:

proc failtrace {n1 n2 op} {
     return -code error "trace on $n1 fails by request"
}
trace add variable result1 write failtrace

proc x {} {
     variable result1
     for {set i 0} {$i < 100} {incr i} {
	set status2 [catch {
	    set status1 [catch {
		return -code error -level 0 "original failure"
	    } result1 options1]
	} result2 options2]
	lappend retval $status2 $result2 $options2
     }
}

puts [x]

To see what's going on here, it helps to look at the generated
bytecode for the nested [catch]es.  (The [for] loop is immaterial; it's
there only because the offending code doesn't offend badly enough to
cause a Tcl_Panic on the first round.) I include its disassembly
below.

The key sequence of events is:

We enter the outer [catch] at pc 29, and push an exception context
at pc 38.

Immediately, we enter the inner catch at pc 43 and push another
exception context at pc 52.

The codeburst from 57-70 throws an exception with the 'returnImm 1 0'
at pc 70.  The innermost exception range at pc 70 is range 3, and
execution proceeds to pc 96.

96-97 push returnOpts and result to the stack.  98 tries to store the
result in 'result1', and the result is a failed trace and a new
exception. The innermost exception range at pc 98 is range 2, and
execution proceeds to pc 125.

The code burst from 125 to 133 proceeds normally, scavenging result,
return options and return code. At 134, we pop an exception context,
which is the one that was pushed at 52.

And we've left the outer [catch], with an exception context still
on the stack (and the stack pointer in the wrong place, since we
restored it from the wrong context).  Crash.

OK, now, what to do about this?

Arguably, the bytecode compiler for [catch] is failing to follow
the rules by having instructions between beginCatch and endCatch
that are outside the exception range. Naughty [catch], no cookie!

But that's too simplistic a story.  Assume that we were to extend
the exception range to encompass the code from 96-104. What would
happen then? The failed 'storeScalar' into 'result1' would be
take execution back to 96 - with several additional elements on
the operand stack. The code starting at 96 would again try to
store to 'result1', encountering the same error, and going back to
96 again in an infinite loop consuming operand stack space. Another
crash.

Putting the entire code burst after the 'endCatch' is no better.
The problem here is that 'endCatch' (necessarily) resets the interpreter
result, so 'pushResult', 'pushReturnCode' and 'pushReturnOptions'
will all simply find an empty result and a TCL_OK return code.

So we need to have the 'pushResult', 'pushReturnCode' and
'pushReturnOptions' before the 'endCatch' but the 'storeScalar'
instructions after it. (It looks possible to reorder the code
to make it happen.) Since 'pushResult', 'pushReturnCode' and
'pushReturnOptions' cannot throw, it is immaterial whether they
are inside or outside the exception range.

Now, how do we formulate this into a general safety rule? I'd
propose perhaps the following:

(a) Code reachable from 'beginCatch' that does not pass through
     a matching 'endCatch' may be inside or outside the exception
     range.
(b) Code outside the exception range, that is reachable from
     'beginCatch', and not reachable from the exception range,
     is 'before' the catch. Such code may use only 'nonthrowing'
     instructions.
(c) Code outside the exception range that is reachable from inside
     the exception range is 'after' the catch and may use only
     'nonthrowing' instructions.
(d) The pc that is the exception range's target, and all code
     reachable from there, is also 'after' the catch.
(e) It is permissible for control to bass from 'before' to 'within'
     and from 'within' to 'after', but not in the reverse direction.

The 'nonthrowing' instructions comprise: push, pop, dup, over,
reverse, jump1, jump4, jumpTable, returnCodeBranch, and
of course 'endCatch' (which restores the exception context).

My two questions for the team are:

(1) Is this a reasonable state of constraints for the compilers
     to live with?
(2) Is this set of constraints the right one to ensure that
     we don't smash the exception stack?

I am purposely ignoring 'loop' exception ranges for the moment.
Those have their own bugs (notably 2835313), but I'm chasing one
set of issues at a time.

-- 73 de ke9tv/2, Kevin