Tcl Source Code

View Ticket
Login
Ticket UUID: 2802881
Title: segfault with trace on ::errorInfo
Type: Bug Version: obsolete: 8.5.7
Submitter: msofer Created on: 2009-06-08 12:05:14
Subsystem: 47. Bytecode Compiler Assigned To: msofer
Priority: 7 High Severity:
Status: Closed Last Modified: 2009-06-13 21:34:43
Resolution: Fixed Closed By: dgp
    Closed on: 2009-06-13 14:34:43
Description:
(see http://groups.google.com/group/comp.lang.tcl/browse_thread/thread/eb5ee89cc9450b52#)

The following script causes a segfault:
set ::errorLevel -1
set ::errorStack {}
trace add variable ::errorInfo write {
    set __n [info level]
    if {($__n > 0) && ($__n != $::errorLevel)} {
        set ::errorLevel $__n
        set __l [info level 0]
        lappend ::errorStack $__l
    }
}
proc A {} {if {foo} foo}
A 0

A stack trace shows that the fault is in INST_LOAD_SCALAR1 (TEBC line 2552) while running the trace script. The problem is that the trace script doesn't have any local variables, this instruction shouldn't have been compiled in.

The compilation is faulty: at the time of the crash we see that the bytecode has a non-NULL codePtr, which is wrong:
 (gdb) p *codePtr
$1 = {interpHandle = 0x7548d0, compileEpoch = 3, nsPtr = 0x754ad0, nsEpoch = 0, refCount = 2, flags = 0, 
  source = 0x780e80 "($__n > 0) && ($__n != $::errorLevel)", procPtr = 0x7ac900, structureSize = 47393536776768, 
  numCommands = 0, numSrcBytes = 37, numCodeBytes = 22, numLitObjects = 4, numExceptRanges = 0, numAuxDataItems = 0, 
  numCmdLocBytes = 0, maxExceptDepth = 0, maxStackDepth = 2, codeStart = 0x7741f0 "\n", objArrayPtr = 0x774208, 
  exceptArrayPtr = 0x0, auxDataArrayPtr = 0x0, codeDeltaStart = 0x774228 "\ufffd\ufffdz", codeLengthStart = 0x774228 "\ufffd\ufffdz", 
  srcDeltaStart = 0x774228 "\ufffd\ufffdz", srcLengthStart = 0x774228 "\ufffd\ufffdz", localCachePtr = 0x0}

(gdb) p *codePtr->procPtr->bodyPtr
$2 = {refCount = 1, bytes = 0x780c00 "if {foo} foo", length = 12, typePtr = 0x0, internalRep = {longValue = 8043008, 
    doubleValue = 3.9737739420263127e-317, otherValuePtr = 0x7aba00, wideValue = 8043008, twoPtrValue = {ptr1 = 0x7aba00, 
      ptr2 = 0x0}, ptrAndLongRep = {ptr = 0x7aba00, value = 0}}}

The CallFrame does look ok:
(gdb) p *((Interp *)interp)->varFramePtr
$3 = {nsPtr = 0x754ad0, isProcCallFrame = 0, objc = 0, objv = 0x0, callerPtr = 0x754ed0, callerVarPtr = 0x754ed0, 
  level = 1, procPtr = 0x0, varTablePtr = 0x0, numCompiledLocals = 0, compiledLocals = 0x0, clientData = 0x0, 
  localCachePtr = 0x0}
User Comments: dgp added on 2009-06-13 21:34:43:

allow_comments - 1


fixed for 8.4.20, 8.5.8, 8.6b2.

Left buggy in earlier 8.* branches

dgp added on 2009-06-13 21:13:49:
newer patch has a test case too.
Wasn't able to find any reasonably
easy way to trigger without tracing
a ::error* variable.

dgp added on 2009-06-13 21:10:11:

File Added - 330860: 2802881-test.patch

dgp added on 2009-06-13 19:58:37:
Here's the patch for the 8-5-branch.
Adapts easily to any branch.

Problem is that the value stashed in
iPtr->compiledProcPtr is surviving too
long and has the ability to influence
nested compiles that it should not.

dgp added on 2009-06-13 19:57:05:

File Added - 330848: 2802881.patch

dgp added on 2009-06-13 04:16:36:
Digging deeper into some of the
details, it appears part of the problem
here is simply that Tcl_ExprObj() can't
be called just any old time.  Not clear to
me yet just what the proper constraints
are, but calling it within TclCompileIfCmd()
makes the same segfault.

dgp added on 2009-06-12 23:48:01:
Please note that the script in the report
makes assumptions about the timing of
writes to ::errorInfo by the core.  I very
much would like to view such matters as
internal details and be more free to change
them.  That's why I say Don't Do That.

Experience and compat concerns have already
overruled me (see 1397843 and 1649062)
on this one, but my general hope is to get
away from tracing ::error* variables, and if
Tcl_LogCommandInfo() needs callbacks
(or some other equivalent) then let's get
that done in a supported way.

Report 1773040 appears to have some
similarity to this one.  It resolved down to
some missing CACHE/DECACHE protection.
Will be looking for that as I check further.

ferrieux added on 2009-06-12 23:13:05:
I just wanted to add that the problem is specific to compile-time errors, while the 'offending' Tcl code has been working perfectly for runtime errors for a long time (see precisions at the end of http://wiki.tcl.tk/traceback). So just "Don't Do That" is not really an option, especially while TIP348 is still a project (and I know whom to blame ;-).

I agree that there are two separate things to do in parallel; get rid of the compile-resetresult for 8.5+, and remove the segfault for 8.4. Even a garbled errorStack/errorInfo would be an acceptable compromise for 8.4, since the compile-time error is not the core target of the substituted errorStack tool (errorInfo or even source position are amply enough to debug compile-time errors, except for dynamically generated procs maybe).

dgp added on 2009-06-12 23:04:03:
ferrieux, removing the Tcl_ResetResult() call
from TclCompileScript() disrupts the segfaulting
sequence in 8.5+.  It's possible that removing
that call should have been included as part of
the 8.4->8.5 reform which removed the concept
of a "compile error" from Tcl.  I'm not sure yet.
That prescription won't help provide a fix for
Tcl 8.4.20, though.

Since earlier releases crashed too on the "same"
scripts, I think there's a deeper problem here that
I want to take some time to consider before possibly
covering over our ability to demonstrate it.

dgp added on 2009-06-12 22:58:14:
Adjusting for changes to [trace] in 8.4, this
crashes Tcl all the way back to 8.0.  With
suitable mods, it does appear to work in
Tcl 7.6p2:
% set errorLevel -1
-1
% set errorStack {}
% trace variable errorInfo w {
if {[info level]} {
upvar #0 errorLevel errorLevel errorStack errorStack
}
set __n [info level]
if {($__n > 0) && ($__n != $errorLevel)} {
set errorLevel $__n
set __l [info level 0]
lappend errorStack $__l
} ;# }
% proc A {} {if {foo} foo}
% A 0
called "A" with too many arguments
% A
syntax error in expression "foo"
% set errorInfo
syntax error in expression "foo"
    while executing
"A"
% set errorStack
A

Is that the background here?  Reviving some
Tcl 7 code?

ctasada added on 2009-06-11 14:32:15:
Hi Don,

How do you suggest to do that in 8.5 then?

Thanks.

ferrieux added on 2009-06-11 13:33:34:
Don can you answer my questions ?

andreas_kupries added on 2009-06-11 06:15:07:
Wondering if it would work in 7.x, without the bcc.

dgp added on 2009-06-11 06:07:34:
just confirmed this crashes the 8-4-branch too.

and the 8.4.19 release too.

and the 8.4.8 release too.  and 8.4.0 too.

This isn't some flaw in recent development.

Which also suggests this is not old code getting
broken by more recent Tcl releases.  That is, this
is new code.

In that case, my advice is Don't Do That.  New code
should target Tcl 8.5 and code with the features of
Tcl 8.5 available need not make any use of the
::errorInfo or ::errorCode variables at all.  So don't
use them.

I'm still curious about what's going on, and will try
to track it down and fix it.

ferrieux added on 2009-06-08 21:40:41:
More precisely:
 - is it desirable that CompileScript calls Tcl_ResetResult ?
 - is it desirable that Tcl_ResetResult calls the trace_enabled Tcl_SetVarXX on errorCode and errorInfo ?

ferrieux added on 2009-06-08 20:08:06:
Yes, as suggested in the thread, the trace is called during *compilation* (as shown by ::tcl::unsupported::disass).
From my seat, this mere fact sounds very very wrong. Am I mistaken ?

(excerpt from backtrace in gdb:)
#10 0x00296ea7 in TclCallVarTraces () from ./libtcl8.6.so
#11 0x002970d6 in TclObjCallVarTraces () from ./libtcl8.6.so
#12 0x0029cfc1 in TclPtrSetVar () from ./libtcl8.6.so
#13 0x0029f910 in Tcl_ObjSetVar2 () from ./libtcl8.6.so
#14 0x0028a654 in Tcl_ResetResult () from ./libtcl8.6.so
#15 0x00230bdf in TclCompileScript () from ./libtcl8.6.so

msofer added on 2009-06-08 19:19:27:
The stack trace is 
(gdb) bt 5
#0  0x00000000004ec601 in TclExecuteByteCode (interp=0x754400, codePtr=0x7b6400)
    at /home/CVS/tcl-core-8-5-branch/unix/../generic/tclExecute.c:2552
#1  0x00000000004e9a59 in Tcl_ExprObj (interp=0x754400, objPtr=0x7ab5b0, resultPtrPtr=0x7fffea3bc910)
    at /home/CVS/tcl-core-8-5-branch/unix/../generic/tclExecute.c:1262
#2  0x000000000048a86b in Tcl_ExprBooleanObj (interp=0x754400, objPtr=0x7b6400, ptr=0x7fffea3bc990)
    at /home/CVS/tcl-core-8-5-branch/unix/../generic/tclBasic.c:5387
#3  0x00000000004973f5 in Tcl_IfObjCmd (dummy=0x0, interp=0x754400, objc=3, objv=0x7557a0)
    at /home/CVS/tcl-core-8-5-branch/unix/../generic/tclCmdIL.c:233
#4  0x000000000048821c in TclEvalObjvInternal (interp=0x754400, objc=3, objv=0x7557a0, 
    command=0x7b991e "if {($__n > 0) && ($__n != $::errorLevel)} {\n        set ::errorLevel $__n\n        set __l [info level 0]\n        lappend ::errorStack $__l\n    }\n::errorInfo {} write", length=146, flags=0)
    at /home/CVS/tcl-core-8-5-branch/unix/../generic/tclBasic.c:3690

msofer added on 2009-06-08 19:14:22:
Interesting: the BODY of A seems to make a difference, even though it is never run due to the error in the number of args!

% proc A {} {}
% A 0
wrong # args: should be "A"
% proc A {} {foo}
% A 0
wrong # args: should be "A"
% A
invalid command name "foo"
% proc A {} {if {foo} foo}
% A 0
Segmentation fault (core dumped)

Attachments: