Ticket UUID: | 1904907 | |||
Title: | Fix [chan create]/return dict key finalization order issues | |||
Type: | Bug | Version: | obsolete: 8.5.1 | |
Submitter: | schwarzkopf | Created on: | 2008-02-29 19:16:37 | |
Subsystem: | 24. Channel Commands | Assigned To: | andreas_kupries | |
Priority: | 6 | Severity: | ||
Status: | Open | Last Modified: | 2008-03-13 05:08:53 | |
Resolution: | None | Closed By: | ||
Closed on: | ||||
Description: |
% parray tcl_platform tcl_platform(byteOrder) = littleEndian tcl_platform(machine) = x86_64 tcl_platform(os) = Linux tcl_platform(osVersion) = 2.6.22-14-generic tcl_platform(platform) = unix tcl_platform(pointerSize) = 8 tcl_platform(threaded) = 1 tcl_platform(user) = bschwarz tcl_platform(wordSize) = 8 I have a file with this in it, and it produces a segfault: et fid [chan create {r w} ::vchan] chan puts $fid "HELLO THERE" set data [read $fid] If I add the following at the end, it does not segfault (note that I need a catch, because there is another problem when I [chan close]): catch {chan close $fid} I don't see the problem on "normal" channels (i.e. opening a file on the filesystem with [open]) | |||
User Comments: |
dgp added on 2008-03-13 05:08:53:
Logged In: YES user_id=80530 Originator: NO It seems that Tcl_GetReturnOptions() needs to be able to work on an interp argument over the entire lifetime of that interp. This means the cleanup of these key values needs to be triggered by a Tcl_InterpDeleteProc associated with the interp, and not by a Tcl_ExitProc thread exit handler as is currently done. It helps with performance to have multiple interps in one thread share keys, so a solution that still permits that would be best. andreas_kupries added on 2008-03-08 05:46:37: Logged In: YES user_id=75003 Originator: NO The workaround has been committed, with proper warning that it is a workaround, introduces a memory leak, and a proper fix has to wade into the tangel that is finalization ordering. This entry is left open, at lesser priority, and updated summary. andreas_kupries added on 2008-03-07 06:59:30: Logged In: YES user_id=75003 Originator: NO I am now convinced that threaded and non-threaded are seeing two different expression of the same ordering problem. Tcl_FinalizeThread is run, this releases the dict keys for the return options. In the non-threaded case this crashed when Tcl tried to compile the 'return' command in 'vchan::write' in preparation of its invokation by the ReflectOutput. In the threaded case the compile managed to complete successfully and we made to the execution stage, i.e. the procedure was actually invoked. After that a script level error (expected return of int, got nothing) caused the generation of an error message by the reflection code and marshalling that error then crashed on the same missing return option keys. Don's 'fix' is a fix for non-threaded as well, I just did not see that immediately because my tclsh picked up the old library. 'make install' was needed after recompile. I agree that this fix introduces a mem leak if Tcl is used in an environment where the 'OS' has no true processes which are fully cleaned up. Example: IOS. dgp added on 2008-03-07 06:03:28: Logged In: YES user_id=80530 Originator: NO Apparently several of us testing with the demo script are seeing different failures. The one I see happens because Tcl_GetReturnOptions() is called after Tcl_FinalizeThread(). The "fix" (workaround?) is to add keys[i] = NULL; at the right place in the ReleaseKeys() routine, so that in this scenario, the GetKeys() can re-initialize. This stops the crash, but adds a memleak, so I don't want to commit without more analysis. This is looking like a finalization order dependency mess to untangle. Another way out would be to just take the whole GetKeys() business out (perhaps conditioned on TclInExit() ?) and force the routines using it to created their key values on demand instead of pulling from a per-thread cache. dgp added on 2008-03-07 04:36:45: Logged In: YES user_id=80530 Originator: NO The stack trace shows the problem coming from MarshallError() which to me indicates that 1428575 may be the best way out. andreas_kupries added on 2008-03-07 03:23:34: File Added - 269366: STACK.txt Logged In: YES user_id=75003 Originator: NO Using a symbol enabled tclsh I got myself a stacktrace from gdb ... Attached ... What happens is that FlushChannel (#20) invokes the driver output function, one 'ReflectOuptut' (#19), which delegates the writing to the '::vchan' via 'InvokeTclMethod' (#18), this then ends up in a call to '::vchan::write', and then it seems to compile this procedure (ProcCompileProc @ #10). And when it compiles the 'return' command of 'vchan::write' it craps out in 'TclMergeReturnOptions' (#5). This function was last changed Oct 18, 2007. The caller, 'TclCompileReturnCmd' (#6) was however changed last quite recently, Feb 28 2008, to fix a memory leak. As the crash happens on a Tcl_Obj with refCount 0 and otherwise looking already freed as well, I wonder if that change caused the problem. I will now get me some older revisions of the head and see if I can bisect the issue. File Added: STACK.txt schwarzkopf added on 2008-03-04 02:08:02: Logged In: YES user_id=159778 Originator: YES I started tracking this down, and I got as far as this. In the FlushChannel function in tclIO.c, it craps out at: written = (chanPtr->typePtr->outputProc)(chanPtr->instanceData, RemovePoint(bufPtr), toWrite, &errorCode); Not sure how to track it from there, but I think I've tracked it pretty far... schwarzkopf added on 2008-03-01 06:28:59: Logged In: YES user_id=159778 Originator: YES the problem occurs on 32 bit Linux as well, and occurs on Windows andreas_kupries added on 2008-03-01 05:22:21: Logged In: YES user_id=75003 Originator: NO Information collected so far: - Happens for 64/32bit both - Happens for threaded/non-threaded both. schwarzkopf added on 2008-03-01 02:38:08: File Added - 268627: tst_chan.tcl Logged In: YES user_id=159778 Originator: YES File Added: tst_chan.tcl |