Tcl Source Code

View Ticket
Login
Ticket UUID: 0ef291c1c287e325a8e5d17c1390e5a733c12fc9
Title: memory corruption with trace + transchan
Type: Bug Version: 8.6.6 and trunk
Submitter: aspect Created on: 2016-09-20 10:32:03
Subsystem: 26. Channel Transforms Assigned To: aku
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2016-10-08 16:58:40
Resolution: None Closed By: nobody
    Closed on:
Description:
Test script:
~~~
proc trans args {list initialize finalize write}
chan push stdout trans#11 0x00007ffff7b162d0 in Tcl_PopCallFrame (interp=0x6305b0) at /home/aspect/Tcl/Env/src/tcl/generic/tclNamesp.c:404

proc trans args {list}  ;# optional - cleans up output

proc foo {} {puts okay}
trace add execution foo enterstep {puts X; list}
foo
~~~

aborts with:
~~~
file = /home/aspect/Tcl/Env/src/tcl/generic/tclListObj.c, line = 124
incrementing refCount of previously disposed object
~~~

valgrind, memtrace, gdb backtrace attached.
User Comments: ferrieux added on 2016-09-20 21:21:29:
Thanks, it's much better to chase a single goose at a time.
With the proper configure flags I reproduced locally, and with a hardware watchpoint on the (constant) offending "previously disposed object", it appears the last overwrite is a bulk erasure within Tcl_DeleteHashEntry, *not* an individual object-freeing.

So, as you suggest, something very wrong might have happened some time ago, ending up in InitArgsAndLocals passing a bunch of stale pointers to NewList.

Need more digging...

aspect added on 2016-09-20 14:20:27:
> Or that when compiling with -g, the behavior differs ?

Different compile options make it manifest differently - looks like
memory corruption.  My guess is refcount management, though whether
in trace or transchan machinery or somewhere else I have no idea.
[puts] in both locations seems to be required.

oops.bt is --enable-symbols=yes, the others (and now
oops-tcl-panic.bt) are with --enable-symbols=all.

The stack traces start the same but diverge above TclNRRunCallbacks
at #13/#11 .. maybe that's helpful.

ferrieux added on 2016-09-20 12:42:26:
Unless I'm mistaken, the gdb bt ends in a different panic:
"invalid block: 0x7ffff7a0d9af: 2d ff"

Does it mean there is an heisenbug touch to it ?
Or that when compiling with -g, the behavior differs ?

Attachments: