Tcl Source Code

View Ticket
Login
Ticket UUID: 1904907
Title: Fix [chan create]/return dict key finalization order issues
Type: Bug Version: obsolete: 8.5.1
Submitter: schwarzkopf Created on: 2008-02-29 19:16:37
Subsystem: 24. Channel Commands Assigned To: andreas_kupries
Priority: 6 Severity:
Status: Open Last Modified: 2008-03-13 05:08:53
Resolution: None Closed By:
    Closed on:
Description:
% parray tcl_platform
tcl_platform(byteOrder)   = littleEndian
tcl_platform(machine)     = x86_64
tcl_platform(os)          = Linux
tcl_platform(osVersion)   = 2.6.22-14-generic
tcl_platform(platform)    = unix
tcl_platform(pointerSize) = 8
tcl_platform(threaded)    = 1
tcl_platform(user)        = bschwarz
tcl_platform(wordSize)    = 8

I have a file with this in it, and it produces a segfault:

  et fid [chan create {r w} ::vchan]

  chan puts $fid "HELLO THERE"
  set data [read $fid]

If I add the following at the end, it does not segfault (note that I need a catch, because there is another problem when I [chan close]):

  catch {chan close $fid}
  
I don't see the problem on "normal" channels (i.e. opening a file on the filesystem with [open])
User Comments: dgp added on 2008-03-13 05:08:53:
Logged In: YES 
user_id=80530
Originator: NO


It seems that Tcl_GetReturnOptions() needs to
be able to work on an interp argument over the
entire lifetime of that interp.  This means
the cleanup of these key values needs to be
triggered by a Tcl_InterpDeleteProc associated
with the interp, and not by a Tcl_ExitProc
thread exit handler as is currently done.

It helps with performance to have multiple
interps in one thread share keys, so a solution
that still permits that would be best.

andreas_kupries added on 2008-03-08 05:46:37:
Logged In: YES 
user_id=75003
Originator: NO

The workaround has been committed, with proper warning that it is a workaround, introduces a memory leak, and a proper fix has to wade into the tangel that is finalization ordering.

This entry is left open, at lesser priority, and updated summary.

andreas_kupries added on 2008-03-07 06:59:30:
Logged In: YES 
user_id=75003
Originator: NO

I am now convinced that threaded and non-threaded are seeing two different expression of the same ordering problem. Tcl_FinalizeThread is run, this releases the dict keys for the return options.

In the non-threaded case this crashed when Tcl tried to compile the 'return' command in 'vchan::write' in preparation of its invokation by the ReflectOutput. In the threaded case the compile managed to complete successfully and we made to the execution stage, i.e. the procedure was actually invoked. After that a script level error (expected return of int, got nothing) caused the generation of an error message by the reflection code and marshalling that error then crashed on the same missing return option keys.

Don's 'fix' is a fix for non-threaded as well, I just did not see that immediately because my tclsh picked up the old library. 'make install' was needed after recompile.

I agree that this fix introduces a mem leak if Tcl is used in an environment where the 'OS' has no true processes which are fully cleaned up. Example: IOS.

dgp added on 2008-03-07 06:03:28:
Logged In: YES 
user_id=80530
Originator: NO


Apparently several of us testing
with the demo script are seeing
different failures.  The one I see
happens because Tcl_GetReturnOptions()
is called after Tcl_FinalizeThread().

The "fix" (workaround?) is to add
    keys[i] = NULL;
at the right place in the ReleaseKeys()
routine, so that in this scenario, the
GetKeys() can re-initialize.  This stops
the crash, but adds a memleak, so I don't
want to commit without more analysis.
This is looking like a finalization order
dependency mess to untangle.

Another way out would be to just take
the whole GetKeys() business out (perhaps
conditioned on TclInExit() ?) and force
the routines using it to created their
key values on demand instead of pulling
from a per-thread cache.

dgp added on 2008-03-07 04:36:45:
Logged In: YES 
user_id=80530
Originator: NO


The stack trace shows the problem
coming from MarshallError() which
to me indicates that 1428575 may
be the best way out.

andreas_kupries added on 2008-03-07 03:23:34:

File Added - 269366: STACK.txt

Logged In: YES 
user_id=75003
Originator: NO

Using a symbol enabled tclsh I got myself a stacktrace from gdb ... Attached ...

What happens is that FlushChannel (#20) invokes the driver output function, one 'ReflectOuptut' (#19), which delegates the writing to the '::vchan' via 'InvokeTclMethod' (#18), this then ends up in a call to '::vchan::write', and then it seems to compile this procedure (ProcCompileProc @ #10). And when it compiles the 'return' command of 'vchan::write' it craps out in 'TclMergeReturnOptions' (#5). This function was last changed Oct 18, 2007. The caller, 'TclCompileReturnCmd' (#6) was however changed last quite recently, Feb 28 2008, to fix a memory leak. As the crash happens on a Tcl_Obj with refCount 0 and otherwise looking already freed as well, I wonder if that change caused the problem.

I will now get me some older revisions of the head and see if I can bisect the issue.

File Added: STACK.txt

schwarzkopf added on 2008-03-04 02:08:02:
Logged In: YES 
user_id=159778
Originator: YES

I started tracking this down, and I got as far as this. In the FlushChannel function in tclIO.c, it craps out at:

        written = (chanPtr->typePtr->outputProc)(chanPtr->instanceData,
                RemovePoint(bufPtr), toWrite, &errorCode);

Not sure how to track it from there, but I think I've tracked it pretty far...

schwarzkopf added on 2008-03-01 06:28:59:
Logged In: YES 
user_id=159778
Originator: YES

the problem occurs on 32 bit Linux as well, and occurs on Windows

andreas_kupries added on 2008-03-01 05:22:21:
Logged In: YES 
user_id=75003
Originator: NO

Information collected so far:
- Happens for 64/32bit both
- Happens for threaded/non-threaded both.

schwarzkopf added on 2008-03-01 02:38:08:

File Added - 268627: tst_chan.tcl

Logged In: YES 
user_id=159778
Originator: YES

File Added: tst_chan.tcl

Attachments: