Tcl Source Code

View Ticket
Login
Ticket UUID: 2919042
Title: Exit reform lost valgrindability
Type: Bug Version: obsolete: 8.6b1
Submitter: ferrieux Created on: 2009-12-22 01:06:25
Subsystem: 42. Memory Preservation Assigned To: dgp
Priority: 8 Severity:
Status: Closed Last Modified: 2011-09-14 01:01:29
Resolution: Fixed Closed By: dgp
    Closed on: 2011-09-13 18:01:29
Description:
The quick-exit reform of Bug 2001201 neglected the "valgrindability" of Tcl executables.
Indeed, it moved Tcl_Finalize out of the default Tcl_Exit path.
The attached patch corrects this by letting the TCL_FINALIZE_ON_EXIT environment variable, when set and not "0", make Tcl_Exit call Tcl_Finalize in the exact same sequence as before the reform.
It should be noted that while useful for leak hunting, this tool shouldn't be used too creatively in threaded builds. The fundamental reason behind the reform (deadlocks and crashes) is here to stay.
User Comments: dgp added on 2011-09-14 01:01:29:

allow_comments - 1

ferrieux added on 2011-08-30 04:02:11:
Yes, the work in progress you've just committed seems the way to go.
Still, two remarks:
(1) maybe you want a cachedInThreadExit alongside cachedInExit since the tsd lookup might be costly
(2) once you've brought thread exit in line with process exit, I guess there will remain the leaks in 3392735, which are the one that the panics were here to highlight.

dgp added on 2011-08-29 23:42:13:
It seems that the [testthread] command has similar
issues.  When [testthread exit] runs, it calls Tcl_ExitThread()
which prevents unwinding after that point, so that the interp
in which [testthread exit] is eval'd doesn't get all its Tcl_Release()
calls, and leaks.

Looking for the right way to adapt the solution here to apply there
as well.  See anything obvious?

ferrieux added on 2011-08-10 00:21:07:
Committed to HEAD.

ferrieux added on 2011-08-07 23:02:29:

File Deleted - 409373:

ferrieux added on 2011-08-07 23:02:18:

File Added - 420499: val5.patch

ferrieux added on 2011-08-07 23:01:23:
Found it. That side-effect was itself a consequence of the patch forcing a Tcl_DeleteInterp of the main interp even when full finalization was not requested. Attached corrected patch. Committable ?

ferrieux added on 2011-08-07 22:24:40:
'k, that offending *single* quote is \x27 produced by chan-4.6 as an output eofchar.
So I guess the patch unwillingly revives some end-of-channel-life code. Hm.

ferrieux added on 2011-08-07 21:00:40:
Cause of a strange side-effect on the test suite (a single " character) I have not found the time to chase...

msofer added on 2011-08-07 19:39:50:
Why wasn't this patch committed?

ferrieux added on 2011-04-23 00:06:04:
Will commit after cleaning up a strange single quote I observe in the testsuite.

juddgilbert added on 2011-04-21 23:19:10:
Finally success! I'm not sure why the patch file val4.patch didn't work on tclMain.c, but I manually patched it by editing the file using the appropriate section from val4.patch and that worked. I no longer have 14400 bytes left on the stack!

Thanks again for your help. 

Now it is going to bug me to no end why that patch file doesn't work for me on just that one file.

ferrieux added on 2011-04-21 21:39:53:
Well, just tried the same on a different system, no problems. Please seek local help on how to use 'patch'.


lat6466.rd.francetelecom.fr:tunzip.12712:{2}> cd Tcl_Source_Code-2e77c2b14bdfb544/
lat6466.rd.francetelecom.fr:Tcl_Source_Code-2e77c2b14bdfb544:{3}> ls
ChangeLog       ChangeLog.2002  ChangeLog.2007  compat/   libtommath/    tests/
ChangeLog.1999  ChangeLog.2003  ChangeLog.2008  doc/      license.terms  tools/
ChangeLog.2000  ChangeLog.2004  README          generic/  macosx/        unix/
ChangeLog.2001  ChangeLog.2005  changes         library/  pkgs/          win/
lat6466.rd.francetelecom.fr:Tcl_Source_Code-2e77c2b14bdfb544:{4}> patch -p0 < /tmp/val4.patch 
patching file generic/tclBasic.c
patching file generic/tclEvent.c
patching file generic/tclExecute.c
patching file generic/tclMain.c

juddgilbert added on 2011-04-21 20:56:15:
I'm really sorry if I'm doing something dumb, but val4.patch successfully patched tclBasic.c, tclEvent.c, tclExecute.c, and then failed on tclMain.c. All using the head tarball linked.

Sorry to be such a pain :( I do appreciate your help.

tclMain.c.ref

***************
*** 876,889 ****
      chan = Tcl_GetStdChannel(TCL_STDOUT);
      if (chan != NULL) {
  Tcl_Flush(chan);
      }
      isPtr->prompt = PROMPT_NONE;
  }
  ^L
  /*
   * Local Variables:
   * mode: c
   * c-basic-offset: 4
   * fill-column: 78
   * End:
   */
--- 870,909 ----
      chan = Tcl_GetStdChannel(TCL_STDOUT);
      if (chan != NULL) {
  Tcl_Flush(chan);
      }
      isPtr->prompt = PROMPT_NONE;
+ }
+ ^L
+ /*
+  *----------------------------------------------------------------------
+  *
+  * FreeMainInterp --
+  *
+  *Exit handler used to cleanup the main interpreter and ancillary startup
+  *script storage at exit.
+  *
+  *----------------------------------------------------------------------
+  */
+ 
+ static void
+ FreeMainInterp(
+     ClientData clientData)
+ {
+ Tcl_Interp *interp = (Tcl_Interp *) clientData;
+ 
+ //if (TclInExit()) return;
+ 
+ if (!Tcl_InterpDeleted(interp)) {
+     Tcl_DeleteInterp(interp);
+ }
+     Tcl_SetStartupScript(NULL, NULL);
+     Tcl_Release(interp);
  }
  ^L
  /*
   * Local Variables:
   * mode: c
   * c-basic-offset: 4
   * fill-column: 78
   * End:
   */

ferrieux added on 2011-04-21 13:52:59:
Now fossil even has tarballs, yay :)

For example, today's HEAD:

http://core.tcl.tk/tcl/tarball/Tcl%20Source%20Code-2e77c2b14bdfb544.tar.gz?uuid=2e77c2b14bdfb54416ad95991aad51a1a51a38ba

ferrieux added on 2011-04-21 13:28:13:
Val4.pach is against 8.6 HEAD. Please pull it  (possibly as a zip, not necessarily as fossil clone) from core.tcl.tk.

juddgilbert added on 2011-04-21 05:39:38:
strike that, it won't let me attach the files so I'll paste them in.

tlcBasic.rej

***************
*** 1353,1366 ****
      Tcl_HashTable *hTablePtr;
      ResolverScheme *resPtr, *nextResPtr;
      int i;
  
      /*
-      * Punt if there is an error in the Tcl_Release/Tcl_Preserve matchup.
       */
  
-     if (iPtr->numLevels > 0) {
  Tcl_Panic("DeleteInterpProc called with active evals");
      }
  
      /*
       * The interpreter should already be marked deleted; otherwise how did we
--- 1353,1367 ----
      Tcl_HashTable *hTablePtr;
      ResolverScheme *resPtr, *nextResPtr;
      int i;
  
      /*
+      * Punt if there is an error in the Tcl_Release/Tcl_Preserve matchup,
+  * unless we are exiting.
       */
  
+     if ((iPtr->numLevels > 0) && !TclInExit()) {
  Tcl_Panic("DeleteInterpProc called with active evals");
      }
  
      /*
       * The interpreter should already be marked deleted; otherwise how did we
***************
*** 1480,1490 ****
      /*
       * Pop the root frame pointer and finish deleting the global
       * namespace. The order is important [Bug 1658572].
       */
  
-     if (iPtr->framePtr != iPtr->rootFramePtr) {
  Tcl_Panic("DeleteInterpProc: popping rootCallFrame with other frames on top");
      }
      Tcl_PopCallFrame(interp);
      ckfree(iPtr->rootFramePtr);
      iPtr->rootFramePtr = NULL;
--- 1481,1491 ----
      /*
       * Pop the root frame pointer and finish deleting the global
       * namespace. The order is important [Bug 1658572].
       */
  
+     if ((iPtr->framePtr != iPtr->rootFramePtr) && !TclInExit()) {
  Tcl_Panic("DeleteInterpProc: popping rootCallFrame with other frames on top");
      }
      Tcl_PopCallFrame(interp);
      ckfree(iPtr->rootFramePtr);
      iPtr->rootFramePtr = NULL;
***************
*** 1601,1624 ****
       * Location stack for uplevel/eval/... scripts which were passed through
       * proc arguments. Actually we track all arguments as we do not and cannot
       * know which arguments will be used as scripts and which will not.
       */
  
-     if (iPtr->lineLAPtr->numEntries) {
  /*
   * When the interp goes away we have nothing on the stack, so there
   * are no arguments, so this table has to be empty.
   */
  
  Tcl_Panic("Argument location tracking table not empty");
      }
  
      Tcl_DeleteHashTable(iPtr->lineLAPtr);
-     ckfree(iPtr->lineLAPtr);
      iPtr->lineLAPtr = NULL;
  
-     if (iPtr->lineLABCPtr->numEntries) {
  /*
   * When the interp goes away we have nothing on the stack, so there
   * are no arguments, so this table has to be empty.
   */
  
--- 1602,1625 ----
       * Location stack for uplevel/eval/... scripts which were passed through
       * proc arguments. Actually we track all arguments as we do not and cannot
       * know which arguments will be used as scripts and which will not.
       */
  
+     if (iPtr->lineLAPtr->numEntries && !TclInExit()) {
  /*
   * When the interp goes away we have nothing on the stack, so there
   * are no arguments, so this table has to be empty.
   */
  
  Tcl_Panic("Argument location tracking table not empty");
      }
  
      Tcl_DeleteHashTable(iPtr->lineLAPtr);
+     ckfree((char *) iPtr->lineLAPtr);
      iPtr->lineLAPtr = NULL;
  
+     if (iPtr->lineLABCPtr->numEntries && !TclInExit()) {
  /*
   * When the interp goes away we have nothing on the stack, so there
   * are no arguments, so this table has to be empty.
   */
  

tclEvent.rej

***************
*** 951,981 ****
   */
  
  currentAppExitPtr(INT2PTR(status));
  Tcl_Panic("AppExitProc returned unexpectedly");
      } else {
- /*
-  * Use default handling.
-  */
- 
- InvokeExitHandlers();
- 
- /*
-  * Ensure the thread-specific data is initialised as it is used in
-  * Tcl_FinalizeThread()
-  */
- 
- (void) TCL_TSD_INIT(&dataKey);
- 
- /*
-  * Now finalize the calling thread only (others are not safely
-  * reachable).  Among other things, this triggers a flush of the
-  * Tcl_Channels that may have data enqueued.
-  */
- 
- Tcl_FinalizeThread();
- 
  TclpExit(status);
  Tcl_Panic("OS exit failed!");
      }
  }
  
--- 951,1004 ----
   */
  
  currentAppExitPtr(INT2PTR(status));
  Tcl_Panic("AppExitProc returned unexpectedly");
      } else {
+ const char *fin;
+ Tcl_DString ds;
+ int finalize = 0;
+ 
+ fin = TclGetEnv("TCL_FINALIZE_ON_EXIT", &ds);
+ finalize = ((fin != NULL) && strcmp(fin, "0"));
+ if (fin != NULL) {
+ Tcl_DStringFree(&ds);
+ }
+ 
+ #ifdef PURIFY
+ finalize = 1;
+ #endif
+ if (finalize) {
+ 
+     /*
+      * Thorough finalization for Valgrind et al.
+      */
+ 
+     Tcl_Finalize();
+ 
+ } else {
+ 
+     /*
+      * Fast and deterministic exit (default behavior)
+      */
+     
+     InvokeExitHandlers();
+     
+     /*
+      * Ensure the thread-specific data is initialised as it is used in
+      * Tcl_FinalizeThread()
+      */
+     
+     (void) TCL_TSD_INIT(&dataKey);
+     
+     /*
+      * Now finalize the calling thread only (others are not safely
+      * reachable).  Among other things, this triggers a flush of the
+      * Tcl_Channels that may have data enqueued.
+      */
+     
+     Tcl_FinalizeThread();
+ }
  TclpExit(status);
  Tcl_Panic("OS exit failed!");
      }
  }
  
tclExecute.rej

***************
*** 932,945 ****
  DeleteExecStack(tmpPtr);
      }
  
      TclDecrRefCount(eePtr->constants[0]);
      TclDecrRefCount(eePtr->constants[1]);
-     if (eePtr->callbackPtr) {
- Tcl_Panic("Deleting execEnv with pending NRE callbacks!");
      }
-     if (eePtr->corPtr) {
  Tcl_Panic("Deleting execEnv with existing coroutine");
      }
      ckfree(eePtr);
  }
  
--- 936,949 ----
  DeleteExecStack(tmpPtr);
      }
  
      TclDecrRefCount(eePtr->constants[0]);
      TclDecrRefCount(eePtr->constants[1]);
+     if (eePtr->callbackPtr && !cachedInExit) {
+ Tcl_Panic("Deleting execEnv with pending TEOV callbacks!");
      }
+     if (eePtr->corPtr && !cachedInExit) {
  Tcl_Panic("Deleting execEnv with existing coroutine");
      }
      ckfree(eePtr);
  }
  

tclMain.rej


***************
*** 123,132 ****
   */
  
  MODULE_SCOPE Tcl_MainLoopProc *TclGetMainLoop(void);
  static voidPrompt(Tcl_Interp *interp, InteractiveState *isPtr);
  static voidStdinProc(ClientData clientData, int mask);
  
  #ifndef TCL_ASCII_MAIN
  static Tcl_ThreadDataKey dataKey;
  /*
   *----------------------------------------------------------------------
--- 123,133 ----
   */
  
  MODULE_SCOPE Tcl_MainLoopProc *TclGetMainLoop(void);
  static voidPrompt(Tcl_Interp *interp, InteractiveState *isPtr);
  static voidStdinProc(ClientData clientData, int mask);
+ static void     FreeMainInterp(ClientData clientData);
  
  #ifndef TCL_ASCII_MAIN
  static Tcl_ThreadDataKey dataKey;
  /*
   *----------------------------------------------------------------------
***************
*** 386,395 ****
      }
      if (Tcl_LimitExceeded(interp)) {
  goto done;
      }
  
      /*
       * Invoke the script specified on the command line, if any. Must fetch it
       * again, as the appInitProc might have reset it.
       */
  
--- 387,402 ----
      }
      if (Tcl_LimitExceeded(interp)) {
  goto done;
      }
  
+ /*
+  * Arrange for final deletion of the main interp
+  */
+ // ARGH Munchhausen effect 
+ Tcl_CreateExitHandler(FreeMainInterp, (ClientData)interp);
+ 
      /*
       * Invoke the script specified on the command line, if any. Must fetch it
       * again, as the appInitProc might have reset it.
       */
  
***************
*** 600,629 ****
  
      Tcl_IncrRefCount(cmd);
      Tcl_EvalObjEx(interp, cmd, TCL_EVAL_GLOBAL);
      Tcl_DecrRefCount(cmd);
  }
- 
  /*
   * If Tcl_EvalObjEx returns, trying to eval [exit], something unusual
   * is happening. Maybe interp has been deleted; maybe [exit] was
   * redefined, maybe we've blown up because of an exceeded limit. We
   * still want to cleanup and exit.
   */
- 
- if (!Tcl_InterpDeleted(interp)) {
-     Tcl_DeleteInterp(interp);
- }
-     }
-     Tcl_SetStartupScript(NULL, NULL);
- 
-     /*
-      * If we get here, the master interp has been deleted. Allow its
-      * destruction with the last matching Tcl_Release.
-      */
- 
-     Tcl_Release(interp);
      Tcl_Exit(exitCode);
  }
  
  #ifndef UNICODE
  void
--- 607,623 ----
  
      Tcl_IncrRefCount(cmd);
      Tcl_EvalObjEx(interp, cmd, TCL_EVAL_GLOBAL);
      Tcl_DecrRefCount(cmd);
  }
+ }
  /*
   * If Tcl_EvalObjEx returns, trying to eval [exit], something unusual
   * is happening. Maybe interp has been deleted; maybe [exit] was
   * redefined, maybe we've blown up because of an exceeded limit. We
   * still want to cleanup and exit.
   */
      Tcl_Exit(exitCode);
  }
  
  #ifndef UNICODE
  void
***************
*** 876,889 ****
      chan = Tcl_GetStdChannel(TCL_STDOUT);
      if (chan != NULL) {
  Tcl_Flush(chan);
      }
      isPtr->prompt = PROMPT_NONE;
  }
  
  /*
   * Local Variables:
   * mode: c
   * c-basic-offset: 4
   * fill-column: 78
   * End:
   */
--- 870,909 ----
      chan = Tcl_GetStdChannel(TCL_STDOUT);
      if (chan != NULL) {
  Tcl_Flush(chan);
      }
      isPtr->prompt = PROMPT_NONE;
+ }
+ 
+ /*
+  *----------------------------------------------------------------------
+  *
+  * FreeMainInterp --
+  *
+  *Exit handler used to cleanup the main interpreter and ancillary startup
+  *script storage at exit.
+  *
+  *----------------------------------------------------------------------
+  */
+ 
+ static void
+ FreeMainInterp(
+     ClientData clientData)
+ {
+ Tcl_Interp *interp = (Tcl_Interp *) clientData;
+ 
+ //if (TclInExit()) return;
+ 
+ if (!Tcl_InterpDeleted(interp)) {
+     Tcl_DeleteInterp(interp);
+ }
+     Tcl_SetStartupScript(NULL, NULL);
+     Tcl_Release(interp);
  }
  
  /*
   * Local Variables:
   * mode: c
   * c-basic-offset: 4
   * fill-column: 78
   * End:
   */

juddgilbert added on 2011-04-21 05:36:03:
Patch val4.patch doesn't work for some reason. I'm not an expert with using these, but this is the output:

c]$ patch < val4.patch 
patching file tclBasic.c
Hunk #1 FAILED at 1353.
Hunk #2 FAILED at 1481.
Hunk #3 FAILED at 1602.
3 out of 3 hunks FAILED -- saving rejects to file tclBasic.c.rej
patching file tclEvent.c
Hunk #1 FAILED at 951.
1 out of 1 hunk FAILED -- saving rejects to file tclEvent.c.rej
patching file tclExecute.c
Hunk #1 succeeded at 55 (offset 4 lines).
Hunk #2 succeeded at 850 (offset -46 lines).
Hunk #3 succeeded at 919 (offset 4 lines).
Hunk #4 FAILED at 936.
1 out of 4 hunks FAILED -- saving rejects to file tclExecute.c.rej
patching file tclMain.c
Hunk #1 FAILED at 123.
Hunk #2 FAILED at 387.
Hunk #3 FAILED at 607.
Hunk #4 FAILED at 870.
4 out of 4 hunks FAILED -- saving rejects to file tclMain.c.rej

I started with a clean unpack of tcl8.6b1-src.tar.gz

I've going to attach the .rej files.

ferrieux added on 2011-04-21 03:00:13:

File Added - 409373: val4.patch

ferrieux added on 2011-04-21 02:59:36:

File Deleted - 401498:

ferrieux added on 2011-04-21 02:59:20:
Updating patch to HEAD, removing old ones.

ferrieux added on 2011-02-13 03:42:23:

File Added - 401498: val3.patch

ferrieux added on 2011-02-13 03:41:15:
Attaching val3.patch, which additionally also forces finalization when -DPURIFY.

ferrieux added on 2011-02-13 03:33:14:

File Added - 401497: val2.patch

ferrieux added on 2011-02-13 03:31:47:

File Deleted - 356126:

ferrieux added on 2011-02-13 03:31:05:
First updating the patch to HEAD (comment reformatting in the context made it fail, grrr)

ferrieux added on 2009-12-22 21:25:40:

File Added - 356126: val2.patch

ferrieux added on 2009-12-22 21:24:52:
OK, proceeding anyway.
The attached 'val2' patch is a superset of the previous one; in addition, it establishes an exit handler to call FreeMainInterp, a new routine doing the final DeleteInterp+Release+SetStartupScript(NULL,NULL).
The existing TclInExit() MODULE_SCOPE-visible predicate is used to disable several panics from the running interpreter (solving the Muenchhausen effect).

With this patch, and the TCL_FINALIZE_ON_EXIT variable set, valgrind shows two interesting things (valgring --leak-check==full ./tclsh /dev/null):

   (1) a few bytes are leaked due to the active bytecode and stack frames being left over (associated with a suppressed panic complaining about them).

   (2) several big blocks of TclAllocateFreeObjects's  (2400 bytes) are leaked.

Of course, (1) is expected, pending dedicated work for cleaning up the running interp. No big deal.

However, (2) is funny in the light of the following comment in TclAllocateFreeObjects:

     * This has been noted by Purify to be a potential leak. The problem is
     * that Tcl, when not TCL_MEM_DEBUG compiled, keeps around all allocated
     * Tcl_Obj's, pointed to by tclFreeObjList, when freed instead of actually
     * freeing the memory. TclFinalizeObjects() does not ckfree() this memory,
     * but leaves it to Tcl's memory subsystem finalization to release it.
     * Purify apparently can't figure that out, and fires a false alarm.

It is funny, because there is no mystery. Purify is not at fault. It just turns out that in the default -DTCL_ALLOC=0 -UTCL_MEM_DEBUG used in unix, those blocks are simply malloc'ed with no linking or external referencing. Another proof of that is that in that case, TclFinalizeMemorySubsystem is an empty function !

Of course this merits another bugreport; in the meantime please validate the approach of the current patch.

ferrieux added on 2009-12-22 19:05:54:
Just tried that. Doesn't work, Tcl_Panic("DeleteInterpProc called with active evals");
Indeed that's a natural effect of Tcl_Eval("exit")...
What's the preferred way out of that "Muenchhausen effect" ?

ferrieux added on 2009-12-22 18:49:03:
Ah, valgrindability may be restored, but we still don't delete the main interp during the finalizing exit sequence. Maybe an exit handler doing DeleteInterp+Release is due ?

ferrieux added on 2009-12-22 18:28:05:

File Added - 356116: val.patch

ferrieux added on 2009-12-22 18:27:20:

File Deleted - 356076:

ferrieux added on 2009-12-22 08:06:26:

File Added - 356076: val.patch

Attachments: