Ticket UUID: | 1482718 | |||
Title: | crash after removing trace in recursive call | |||
Type: | Bug | Version: | obsolete: 8.4.14 | |
Submitter: | eugene_cdn | Created on: | 2006-05-05 19:57:21 | |
Subsystem: | 22. [proc] and [uplevel] | Assigned To: | msofer | |
Priority: | 9 Immediate | Severity: | ||
Status: | Closed | Last Modified: | 2008-08-29 09:20:18 | |
Resolution: | Fixed | Closed By: | sf-robot | |
Closed on: | 2008-08-29 02:20:18 | |||
Description: |
OS: Solaris 5.8. Removing trace on second level of recursive call can corrupt or substitute wrong number of compiled variables for first level of recursion. Then, on return, TclLookupSimpleVar() will have two different values of number of compiled variables in procPtr->numCompiledLocals (5) and varFramePtr->numCompiledLocals (3) at tclVar.c, line 818. Then loop for all variables (line 823) will use bigger value, procPtr->numCompiledLocals, and it will crash on accessing varFramePtr->compiledLocals out of bound. Please use attached testcase for reproducing the bug. | |||
User Comments: |
sf-robot added on 2008-08-29 09:20:18:
Logged In: YES user_id=1312539 Originator: NO This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). msofer added on 2008-08-10 22:38:01: Logged In: YES user_id=148712 Originator: NO Removed the fix in HEAD: this is not needed at least since varReform, where the local variable data at runtime is read from the CallFrame and/or the LocalCache. Reopened to consider this for backporting to 8.5, although my inclination is to leave the stable branch alone: this is removing unneeded code, not a bugfix. dgp added on 2006-05-14 00:16:54: Logged In: YES user_id=80530 fixed for 8.4.14 and 8.5a5 dgp added on 2006-05-10 23:23:54: File Added - 177487: 1482718-85.patch dgp added on 2006-05-10 23:23:53: Logged In: YES user_id=80530 And the corresponding patch for 8.5a4 dgp added on 2006-05-10 23:18:34: File Deleted - 177474: File Added - 177485: 1482718.patch dgp added on 2006-05-10 23:18:33: Logged In: YES user_id=80530 Here's the corrected patch for 8.4.13. dgp added on 2006-05-10 23:02:15: Logged In: YES user_id=80530 Patch is flawed; examining... dgp added on 2006-05-10 22:06:49: File Added - 177474: 1482718.patch Logged In: YES user_id=80530 Here's the patch against 8.4.13. Note the patch leaves the bug unfixed when the routine getting recompiled is due to a call to TclProcCompileProc() from an extension intruding on internals. dgp added on 2006-05-10 21:06:00: Logged In: YES user_id=80530 Based on a plan laid out by miguel, I have a fix for this in my development sources. I'll post a patch for testing after access to CVS is restored. dgp added on 2006-05-09 11:21:42: Logged In: YES user_id=80530 I added a test that the two numCompiledLocals values were the same, and a Tcl_Panic() when they are not. This let me detect the problem directly without needing lots of other things "just so" to force a segfault. This let me simplify the demo script to: load libtrace.so proc Console:Eval {cmd} { set enterx 0 set ret [uplevel #0 $cmd] if {$enterx} {} return $ret } Console:Eval { closeTrace Console:Eval {} } The value mismatch happens because [closeTrace] calls Tcl_DeleteTrace() which advances the bytecode compile epoch (line 5208 of tclBasic.c). This means the nested [Console:Eval] call is a different compile from the outer one, and the two compiles are done under different settings of the DONT_COMPILE_CMDS_INLINE flag. The different compile rules may well explain the different number of compiledLocals. The remaining bug here appears to be that the CallFrame of the outer call now has a procPtr field pointing to a Proc corresponding to the re-compile? Either that should not happen, or if it's ok for that to happen, then the TclLookupSimpleVar needs a rewrite to not assume the CallFrame and Proc agree about their compiledLocals lists. dgp added on 2006-05-09 04:16:35: Logged In: YES user_id=80530 oops! the loop traverses data from both structs, so the two numCompiledLocals simply must agree. miguel reports: miguelthe CallFrame's is set from the Proc's at tclProc.c line 979. Should not be modified, ever dgp added on 2006-05-09 03:01:26: Logged In: YES user_id=80530 ok, as original report already observed, root cause is that procPtr->numCompiledLocals and varFramePtr->numCompiledLocals do not agree. Since the for loop at line 825 fundamentally loops over varFramePtr->compiledLocals, it would make more sense to use varFramePtr->numCompiledLocals to set the loop limit. Simply making that change appears to fix this bug. But leaves several questions raised. Why do we need to store the same info in two places? And once we've done that, just what corruption is forcing them to get out of sync? dgp added on 2006-05-09 02:32:24: Logged In: YES user_id=80530 Traces are probably involved, but the root cause of the crash appears to be corruption of a Var struct, so let's bring the Var expert in on this. Miguel, the crash happens at line 828 of tclVar.c where localVarPtr->name has a bogus value, even though localVarPtr->refCount is 19. Any hints? dgp added on 2006-05-06 11:30:56: Logged In: YES user_id=80530 looks like it's still there in 8.4.13. Upping priority to see if it can get resolved before 8.4.14. eugene_cdn added on 2006-05-06 02:57:21: File Added - 176972: trace.tgz |