Tcl Source Code

View Ticket
Login
Ticket UUID: 1482718
Title: crash after removing trace in recursive call
Type: Bug Version: obsolete: 8.4.14
Submitter: eugene_cdn Created on: 2006-05-05 19:57:21
Subsystem: 22. [proc] and [uplevel] Assigned To: msofer
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2008-08-29 09:20:18
Resolution: Fixed Closed By: sf-robot
    Closed on: 2008-08-29 02:20:18
Description:
OS: Solaris 5.8.
Removing trace on second level of recursive call can
corrupt or substitute wrong number of compiled
variables for first level of recursion. Then, on
return, TclLookupSimpleVar() will have two different
values of number of compiled variables in
   procPtr->numCompiledLocals (5)
and
   varFramePtr->numCompiledLocals (3)
at tclVar.c, line 818.
Then loop for all variables (line 823) will use bigger
value, procPtr->numCompiledLocals, and it will crash on
accessing varFramePtr->compiledLocals out of bound.
Please use attached testcase for reproducing the bug.
User Comments: sf-robot added on 2008-08-29 09:20:18:
Logged In: YES 
user_id=1312539
Originator: NO

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

msofer added on 2008-08-10 22:38:01:
Logged In: YES 
user_id=148712
Originator: NO

Removed the fix in HEAD: this is not needed at least since varReform, where the local variable data at runtime is read from the CallFrame and/or the LocalCache.

Reopened to consider this for backporting to 8.5, although my inclination is to leave the stable branch alone: this is removing unneeded code, not a bugfix.

dgp added on 2006-05-14 00:16:54:
Logged In: YES 
user_id=80530


fixed for 8.4.14 and 8.5a5

dgp added on 2006-05-10 23:23:54:

File Added - 177487: 1482718-85.patch

dgp added on 2006-05-10 23:23:53:
Logged In: YES 
user_id=80530


And the corresponding patch
for 8.5a4

dgp added on 2006-05-10 23:18:34:

File Deleted - 177474: 



File Added - 177485: 1482718.patch

dgp added on 2006-05-10 23:18:33:
Logged In: YES 
user_id=80530


Here's the corrected patch
for 8.4.13.

dgp added on 2006-05-10 23:02:15:
Logged In: YES 
user_id=80530

Patch is flawed; examining...

dgp added on 2006-05-10 22:06:49:

File Added - 177474: 1482718.patch

Logged In: YES 
user_id=80530


Here's the patch against
8.4.13.

Note the patch leaves the
bug unfixed when the routine
getting recompiled is due to
a call to TclProcCompileProc()
from an extension intruding on
internals.

dgp added on 2006-05-10 21:06:00:
Logged In: YES 
user_id=80530


Based on a plan laid out
by miguel, I have a fix
for this in my development
sources.  I'll post a patch
for testing after access to
CVS is restored.

dgp added on 2006-05-09 11:21:42:
Logged In: YES 
user_id=80530


I added a test that the two
numCompiledLocals values were
the same, and a Tcl_Panic()
when they are not.  This
let me detect the problem directly
without needing lots of other
things "just so" to force a
segfault.

This let me simplify the demo
script to:

load libtrace.so
proc Console:Eval {cmd} {
    set enterx 0
    set ret [uplevel #0 $cmd]
    if {$enterx} {}
    return $ret
}
Console:Eval {
    closeTrace
    Console:Eval {}
}

The value mismatch
happens because [closeTrace]
calls Tcl_DeleteTrace()
which advances the bytecode
compile epoch (line 5208
of tclBasic.c).

This means the nested
[Console:Eval] call is
a different compile from
the outer one, and the two
compiles are done under
different settings of the
DONT_COMPILE_CMDS_INLINE
flag.  The different compile
rules may well explain the
different number of compiledLocals.

The remaining bug here appears
to be that the CallFrame of
the outer call now has a procPtr
field pointing to a Proc corresponding
to the re-compile?

Either that should not happen, or
if it's ok for that to happen, then
the TclLookupSimpleVar needs a rewrite
to not assume the CallFrame and Proc
agree about their compiledLocals lists.

dgp added on 2006-05-09 04:16:35:
Logged In: YES 
user_id=80530


oops!  the loop traverses
data from both structs,
so the two numCompiledLocals
simply must agree.

miguel reports:
miguelthe CallFrame's is set from the Proc's at tclProc.c
line 979. Should not be modified, ever

dgp added on 2006-05-09 03:01:26:
Logged In: YES 
user_id=80530


ok, as original report already
observed, root cause is that
procPtr->numCompiledLocals
and varFramePtr->numCompiledLocals
do not agree.

Since the for loop at line 825
fundamentally loops over
varFramePtr->compiledLocals, it
would make more sense to use
varFramePtr->numCompiledLocals
to set the loop limit.  Simply
making that change appears
to fix this bug.

But leaves several questions
raised.  Why do we need to store
the same info in two places?
And once we've done that, just what
corruption is forcing them
to get out of sync?

dgp added on 2006-05-09 02:32:24:
Logged In: YES 
user_id=80530


Traces are probably involved,
but the root cause of the
crash appears to be corruption
of a Var struct, so let's bring
the Var expert in on this.

Miguel, the crash happens at
line 828 of tclVar.c where
localVarPtr->name has a bogus
value, even though
localVarPtr->refCount is 19.

Any hints?

dgp added on 2006-05-06 11:30:56:
Logged In: YES 
user_id=80530


looks like it's still there in 8.4.13.

Upping priority to see if it can
get resolved before 8.4.14.

eugene_cdn added on 2006-05-06 02:57:21:

File Added - 176972: trace.tgz

Attachments: