Subject: Bytecode safety and exception ranges. From: Kevin Kenny Date: Fri, 29 Oct 2010 12:15:12 -0400 To: Tcl Core List CC: Ozgur Dogan Ugurlu Newsgroups: gmane.comp.lang.tcl.core As most of you are aware, I've been working on bytecode safety analysis for TAL. I started that project simply because it came as a side effect of the fact that a Bytecode object has to assert the stack depth and exception range depth that might appear inside it, but it now seems that the greatest value of the project comes from the fact that the attempts at safety analysis are exposing bytecode compilation bugs. The latest nasty bug is number 3098302. The offending code is fairly simple: proc failtrace {n1 n2 op} { return -code error "trace on $n1 fails by request" } trace add variable result1 write failtrace proc x {} { variable result1 for {set i 0} {$i < 100} {incr i} { set status2 [catch { set status1 [catch { return -code error -level 0 "original failure" } result1 options1] } result2 options2] lappend retval $status2 $result2 $options2 } } puts [x] To see what's going on here, it helps to look at the generated bytecode for the nested [catch]es. (The [for] loop is immaterial; it's there only because the offending code doesn't offend badly enough to cause a Tcl_Panic on the first round.) I include its disassembly below. The key sequence of events is: We enter the outer [catch] at pc 29, and push an exception context at pc 38. Immediately, we enter the inner catch at pc 43 and push another exception context at pc 52. The codeburst from 57-70 throws an exception with the 'returnImm 1 0' at pc 70. The innermost exception range at pc 70 is range 3, and execution proceeds to pc 96. 96-97 push returnOpts and result to the stack. 98 tries to store the result in 'result1', and the result is a failed trace and a new exception. The innermost exception range at pc 98 is range 2, and execution proceeds to pc 125. The code burst from 125 to 133 proceeds normally, scavenging result, return options and return code. At 134, we pop an exception context, which is the one that was pushed at 52. And we've left the outer [catch], with an exception context still on the stack (and the stack pointer in the wrong place, since we restored it from the wrong context). Crash. OK, now, what to do about this? Arguably, the bytecode compiler for [catch] is failing to follow the rules by having instructions between beginCatch and endCatch that are outside the exception range. Naughty [catch], no cookie! But that's too simplistic a story. Assume that we were to extend the exception range to encompass the code from 96-104. What would happen then? The failed 'storeScalar' into 'result1' would be take execution back to 96 - with several additional elements on the operand stack. The code starting at 96 would again try to store to 'result1', encountering the same error, and going back to 96 again in an infinite loop consuming operand stack space. Another crash. Putting the entire code burst after the 'endCatch' is no better. The problem here is that 'endCatch' (necessarily) resets the interpreter result, so 'pushResult', 'pushReturnCode' and 'pushReturnOptions' will all simply find an empty result and a TCL_OK return code. So we need to have the 'pushResult', 'pushReturnCode' and 'pushReturnOptions' before the 'endCatch' but the 'storeScalar' instructions after it. (It looks possible to reorder the code to make it happen.) Since 'pushResult', 'pushReturnCode' and 'pushReturnOptions' cannot throw, it is immaterial whether they are inside or outside the exception range. Now, how do we formulate this into a general safety rule? I'd propose perhaps the following: (a) Code reachable from 'beginCatch' that does not pass through a matching 'endCatch' may be inside or outside the exception range. (b) Code outside the exception range, that is reachable from 'beginCatch', and not reachable from the exception range, is 'before' the catch. Such code may use only 'nonthrowing' instructions. (c) Code outside the exception range that is reachable from inside the exception range is 'after' the catch and may use only 'nonthrowing' instructions. (d) The pc that is the exception range's target, and all code reachable from there, is also 'after' the catch. (e) It is permissible for control to bass from 'before' to 'within' and from 'within' to 'after', but not in the reverse direction. The 'nonthrowing' instructions comprise: push, pop, dup, over, reverse, jump1, jump4, jumpTable, returnCodeBranch, and of course 'endCatch' (which restores the exception context). My two questions for the team are: (1) Is this a reasonable state of constraints for the compilers to live with? (2) Is this set of constraints the right one to ensure that we don't smash the exception stack? I am purposely ignoring 'loop' exception ranges for the moment. Those have their own bugs (notably 2835313), but I'm chasing one set of issues at a time. -- 73 de ke9tv/2, Kevin