Tcl Source Code: View Ticket

Ticket UUID:	3390699
Title:	socket_*-13.1 leaks
Type:	Bug	Version:	obsolete: 8.6b2
Submitter:	dgp	Created on:	2011-08-12 19:01:01
Subsystem:	27. Channel Types	Assigned To:	dgp
Priority:	5 Medium	Severity:
Status:	Closed	Last Modified:	2011-09-12 03:02:33
Resolution:	Fixed	Closed By:	dgp
		Closed on:	2011-09-11 20:02:33
Description:	$ make valgrind TESTFLAGS="-file socket.test -match socket_any-13.1" ... ==13528== 24 bytes in 1 blocks are definitely lost in loss record 367 of 932 ==13528== at 0x4005903: malloc (vg_replace_malloc.c:195) ==13528== by 0x80DDD7C: TclpAlloc (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x80EB794: Tcl_Alloc (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x814010D: TEBCresume (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x80E0BDD: TclNRRunCallbacks (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x80E64BA: Tcl_EvalObjv (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x80E6E24: TclEvalEx (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x80E70D9: Tcl_EvalEx (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== 162,248 (24 direct, 162,224 indirect) bytes in 1 blocks are definitely lost in loss record 932 of 932 ==13528== at 0x4005903: malloc (vg_replace_malloc.c:195) ==13528== by 0x80DDD7C: TclpAlloc (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x80EB794: Tcl_Alloc (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x80890EB: SetCmdNameFromAny (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x808875A: Tcl_GetCommandFromObj (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x8132FC3: TclCompileEnsemble (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x8124880: TclCompileScript (in /home/dgp/fossil/tcl/unix/tcltest) ==13528== by 0x812566B: TclCompileTokens (in /home/dgp/fossil/tcl/unix/tcltest)
User Comments:	dgp added on 2011-09-12 03:02:33: allow_comments - 1 dgp added on 2011-09-12 03:02:32: Leak is in the test, not the tested code. Fixed via conversion from [testthread] to Thread package. ferrieux added on 2011-08-17 05:48:52: Deferring until the more general 3392735 is solved (incomplete freeing of active levels). ferrieux added on 2011-08-13 23:48:05: File Added - 421045: thr2.patch ferrieux added on 2011-08-13 23:44:13: Some progress. I disabled the various panicky checks just to explore "what if" without spending the time of the optimized new "thread prepared to exit" test. Then I observed two things: - there is a race condition due to threadRep just waiting for the higher-level [thread names] to deplete instead of joining. When the thread is near the end and has removed itself from the list, the main thread quickly goes to [exit] and doesn't leave a chance for the other thread to finish its cleanups. - after adding a [threadtest join] in threadReap's loop (and a -joinable at thread creation in socket.test), the leak bill is much reduced: ==4495== definitely lost: 84 bytes in 4 blocks ==4495== indirectly lost: 3,012 bytes in 77 blocks ==4495== possibly lost: 136 bytes in 1 blocks ==4495== still reachable: 418 bytes in 15 blocks ==4495== suppressed: 0 bytes in 0 blocks ferrieux added on 2011-08-13 23:18:26: Answer to last question: this is done by checking the global TclInExit flag. So we want a similar per-thread one, though one that is set earlier than TclInThreadExit. That's additionally tricky because the test is deep inside hi-speed code in tclExecute.c, some without even an interp around... ferrieux added on 2011-08-13 23:11:03: Anyway, single-stepping through the thread's end shows that testthread::exit does not do the same as falling thru. Notably, the Tcl_DeleteInterp() is omitted, which would account for the leak (the Interp object itself !). However, when trying to give back symmetry (with the attached patch), on encounters a secondary problem. In DeleteInterpProc, the test /* * Punt if there is an error in the Tcl_Release/Tcl_Preserve matchup, * unless we are exiting. */ if ((iPtr->numLevels > 0) && !TclInExit()) { Tcl_Panic("DeleteInterpProc called with active evals"); } panics in that case. Of course numLevels is >0 (it is 2) because "exit" is called from within an event handler... as is most often the case. I suspect that is the reason why the original author (stanton, <=1999) removed DeleteInterp from the sequence, but that's pulling it under the rug. Still digging to understand how the equivalent situation is handled in [exit]. Q1: why such a stringent test (panicking if it fails) ? ferrieux added on 2011-08-13 21:11:02: Hmm, looking again, threadReap does wait, though in a strange manner. Essentially: while {[llength [testthread names]] > 1} { foreach tid [testthread names] { testthread send -async $tid {testthread exit} } after 1 } This implies that if one of the threads blocks (possibly forever), tons of such scriptlets will accumulate in its input interthread queue... ferrieux added on 2011-08-13 18:35:39: Yes. Currently, threadReap essentially does testthread::send -async exit, and waits very little. What about doing a testthread::join to resynchronize with their proper termination ? dgp added on 2011-08-13 03:19:12: Leak trouble may be lurking in [tcltest::threadReap] which is problematic in oh so many ways.... dgp added on 2011-08-13 02:08:07: Similar results for socket_inet-13.1

Attachments:

thr2.patch [download] added by ferrieux on 2011-08-13 23:48:05. [details]