Tcl Source Code

View Ticket
Login
Ticket UUID: 3390699
Title: socket_*-13.1 leaks
Type: Bug Version: obsolete: 8.6b2
Submitter: dgp Created on: 2011-08-12 19:01:01
Subsystem: 27. Channel Types Assigned To: dgp
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2011-09-12 03:02:33
Resolution: Fixed Closed By: dgp
    Closed on: 2011-09-11 20:02:33
Description:
$ make valgrind TESTFLAGS="-file socket.test -match socket_any-13.1"
...
==13528== 24 bytes in 1 blocks are definitely lost in loss record 367 of 932
==13528==    at 0x4005903: malloc (vg_replace_malloc.c:195)
==13528==    by 0x80DDD7C: TclpAlloc (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x80EB794: Tcl_Alloc (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x814010D: TEBCresume (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x80E0BDD: TclNRRunCallbacks (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x80E64BA: Tcl_EvalObjv (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x80E6E24: TclEvalEx (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x80E70D9: Tcl_EvalEx (in /home/dgp/fossil/tcl/unix/tcltest)


==13528== 162,248 (24 direct, 162,224 indirect) bytes in 1 blocks are definitely lost in loss record 932 of 932
==13528==    at 0x4005903: malloc (vg_replace_malloc.c:195)
==13528==    by 0x80DDD7C: TclpAlloc (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x80EB794: Tcl_Alloc (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x80890EB: SetCmdNameFromAny (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x808875A: Tcl_GetCommandFromObj (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x8132FC3: TclCompileEnsemble (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x8124880: TclCompileScript (in /home/dgp/fossil/tcl/unix/tcltest)
==13528==    by 0x812566B: TclCompileTokens (in /home/dgp/fossil/tcl/unix/tcltest)
User Comments: dgp added on 2011-09-12 03:02:33:

allow_comments - 1

dgp added on 2011-09-12 03:02:32:
Leak is in the test, not the tested code.
Fixed via conversion from [testthread] to Thread package.

ferrieux added on 2011-08-17 05:48:52:
Deferring until the more general 3392735 is solved (incomplete freeing of active levels).

ferrieux added on 2011-08-13 23:48:05:

File Added - 421045: thr2.patch

ferrieux added on 2011-08-13 23:44:13:
Some progress. I disabled the various panicky checks just to explore "what if" without spending the time of the optimized new "thread prepared to exit" test. 

Then I observed two things:

 - there is a race condition due to threadRep just waiting for the higher-level [thread names] to deplete instead of joining. When the thread is near the end and has removed itself from the list, the main thread quickly goes to [exit] and doesn't leave a chance for the other thread to finish its cleanups.

 - after adding a [threadtest join] in threadReap's loop (and a -joinable at thread creation in socket.test), the leak bill is much reduced:

==4495==    definitely lost: 84 bytes in 4 blocks
==4495==    indirectly lost: 3,012 bytes in 77 blocks
==4495==      possibly lost: 136 bytes in 1 blocks
==4495==    still reachable: 418 bytes in 15 blocks
==4495==         suppressed: 0 bytes in 0 blocks

ferrieux added on 2011-08-13 23:18:26:
Answer to last question: this is done by checking the global TclInExit flag. So we want a similar per-thread one, though one that is set earlier than TclInThreadExit. That's additionally tricky because the test is deep inside hi-speed code in tclExecute.c, some without even an interp around...

ferrieux added on 2011-08-13 23:11:03:
Anyway, single-stepping through the thread's end shows that testthread::exit does *not* do the same as falling thru.
Notably, the Tcl_DeleteInterp() is omitted, which would account for the leak (the Interp object itself !).
However, when trying to give back symmetry (with the attached patch), on encounters a secondary problem. In DeleteInterpProc, the test

    /*
     * Punt if there is an error in the Tcl_Release/Tcl_Preserve matchup,
 * unless we are exiting.
     */
    if ((iPtr->numLevels > 0) && !TclInExit()) {
Tcl_Panic("DeleteInterpProc called with active evals");
    }

panics in that case. Of course numLevels is >0 (it is 2) because "exit" is called from within an event handler... as is most often the case.

I suspect that is the reason why the original author (stanton, <=1999) removed DeleteInterp from the sequence, but  that's pulling it under the rug.

Still digging to understand how the equivalent situation is handled in [exit].

Q1: why such a stringent test (panicking if it fails) ?

ferrieux added on 2011-08-13 21:11:02:
Hmm, looking again, threadReap does wait, though in a strange manner. Essentially:

while {[llength [testthread names]] > 1} {
    foreach tid [testthread names] {
        testthread send -async $tid {testthread exit}
    }
            after 1
}

This implies that if one of the threads blocks (possibly forever), tons of such scriptlets will accumulate in its input interthread queue...

ferrieux added on 2011-08-13 18:35:39:
Yes. Currently, threadReap essentially does testthread::send -async exit, and waits very little.
What about doing a testthread::join to resynchronize with their proper termination ?

dgp added on 2011-08-13 03:19:12:
Leak trouble may be lurking in [tcltest::threadReap]
which is problematic in oh so many ways....

dgp added on 2011-08-13 02:08:07:
Similar results for socket_inet-13.1

Attachments: