Tcl Source Code

View Ticket
Login
Ticket UUID: 2724403
Title: Delete NS with active coro -> leak
Type: Bug Version: obsolete: 8.6b1.1
Submitter: venksi Created on: 2009-04-01 03:41:23
Subsystem: 45. Parsing and Eval Assigned To: msofer
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2010-01-04 03:38:38
Resolution: Fixed Closed By: msofer
    Closed on: 2010-01-03 20:38:38
Description:
This is on cvs-head as of 4:00 PM Mar. 31, 2009 PDT.  The first example
aborts.  Changing the name of the coroutine from "::y::run" to "run"
(as done in the sed command below) makes it pass.  (> is my shell prompt).

> cat a.tcl
puts [info patchlevel]
set interp [interp create x]
$interp eval {
    namespace eval ::y {}
    proc ::y::start {} {yield; puts hello}
}
$interp alias ::bgerror ::bgerror
$interp eval [list coroutine ::y::run ::y::start]
interp delete $interp
> tclsh8.6 a.tcl
8.6b1.1
Argument location tracking table not empty
Abort
> sed -e s/::y::run/run/ < a.tcl > b.tcl
> tclsh8.6 b.tcl
8.6b1.1
>
User Comments: msofer added on 2010-01-04 03:38:38:

allow_comments - 1

fixed in head using solution #2; keeping the #1 patch which yet come in handy if/when we attack [Bug 1655294]

dkf added on 2010-01-01 15:55:48:
Thinking through it, I'd favour deleting the coroutines first (solution #2) so that they get a chance to run any teardown code (i.e., deletion/unset traces) first. Like that, it would minimize the number of additional bugs introduced from places where user code is relying on the current deletion order.

You only need to scan through for coroutines in the case where the activationCount is non-zero.

dgp added on 2009-12-10 02:22:21:
the leak demo in the 2009-05-02
comment still appears leaky though.
Prospects?

dgp added on 2009-12-10 02:19:46:
The failure in the original bug report no
longer fails on the HEAD.  Fixed?

msofer added on 2009-05-10 11:58:53:
The failing tests are namespace-7.1, 35.1 and 46.8; trace-34.4; var-1.16-18

msofer added on 2009-05-10 02:21:56:

File Added - 326372: 2724403.patch

msofer added on 2009-05-10 02:21:36:

File Deleted - 326361:

msofer added on 2009-05-09 23:46:54:
Forgot to mention that there are a few failing tests: those that relied on parts of a dying ns being still there

msofer added on 2009-05-09 23:45:08:

File Added - 326361: 2724403.patch

msofer added on 2009-05-09 23:44:42:
Patch attached with new ns deletion policy: everything goes at once!
Flag bits NS_DYING and NS_DEAD are removed, nsPtr->activationCount is history too. References are now all kept in nsPtr->refCount: nsName objs, CallFrames, parent namespace.

The patch does solve the issues in this bug.

msofer added on 2009-05-09 22:29:22:

File Deleted - 326349:

msofer added on 2009-05-09 22:27:41:

File Added - 326349: 2724403.patch

msofer added on 2009-05-09 22:27:18:

File Deleted - 326338:

msofer added on 2009-05-09 21:11:31:

File Added - 326338: 2724403.patch

msofer added on 2009-05-09 17:23:32:
Possible solutions:
   (1) change the (undocumented) semantics of namespace deletion: after a call to [namespace delete] all of the namespace's children, commands and vars are immediately gone (and so are unavailable for use in deletion traces). Currently there is a predictable order of category deletion (first vars, then commands, then children) and an unpredictable order within categories. This order can be maintained while proceeding to remove everything immediately.
   (2) make coroutines special: THEY are killed immediately on [namespace delete]. Currently this is done only for ensembles.

msofer added on 2009-05-03 11:00:06:
see also bu #1655294 (thx dgp)

msofer added on 2009-05-02 21:43:38:
Confirmed leak on namespace deletion: run the following script and monitor the mem growth under top!

 while 1 {
     set ns ::y[incr q]
     namespace eval $ns {}
     proc ${ns}::start {} {yield; puts hello}
     coroutine ${ns}::run ${ns}::start
     namespace delete $ns
     after 1
 }

Note that both [run] and [start] need to be in the same namespace for the bug to occur: 
  * [start] has a CallFrame that keeps $ns active: $ns is set to be torn down when [start] returns
  * [start] will be forced to return when the coro is wound down, which will be triggered by [run]'s deletion
  * [run] will be deleted when $ns is torn down

That is: $ns will be torn down when [run] is deleted, and [run] will be deleted when $ns is torn down: a sort of deadlock.

msofer added on 2009-05-02 13:03:32:
The problem seems to be that TclTeardownNamespace is never called on ::y
The namespace is active when the call to Tcl_DeleteNamespace is called on it, so it is only marked as NS_DYING|NS_KILLED, and is supposed to be torn down (among others, deleting all its commands) when the CallFrame is popped. But the coro's frame is never dropped in this case, so the ns is not torn down and the command ::y::run is never deleted. 
Nasty.

msofer added on 2009-05-01 23:03:58:
This is an incomplete command deletion when triggered by namespce deletion. The proof is that the test script
(a) runs ok if we '$interp eval {rename ::y::run {}}' before deleting $interp
(b) crashes if we '$interp eval {namespace delete ::y}' before deleting $interp

msofer added on 2009-04-07 04:24:59:
no no, it is in the interplay between: suspended coroutine deletion, namespace deletion and interp finalization. It has nothing to do with app finalization.

venksi added on 2009-04-07 03:48:59:
A note that adding a vwait forever as the last statement still causes it to crash. 
The crash is not at app exit time.

dkf added on 2009-04-03 20:54:44:
Looks like a finalization order bug. Yuck.

Attachments: