Tcl Source Code

View Ticket
Login
Ticket UUID: d4e7780ca1681cd095dbd81fe264feff75c988f7
Title: "global" cmd literal sharing vs. per-interp resolvers
Type: Bug Version: 8.6.6
Submitter: gustafn2 Created on: 2016-07-21 10:03:33
Subsystem: 45. Parsing and Eval Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2016-09-07 13:36:46
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2016-09-07 13:36:46
Description:
Cmd literals are shared in the global (i.e. per-interpreter) literal pool per namespace. However, it is not guaranteed, that the cmdObj resolved for a literal via a cmd resolver is the always same for a namespace. Therefore one sees different results for a cmd depending on whether or not the a cmd literal was earlier added to the global cmd literal pool. The global shared cmd literal is taken, the result from the command resolver is ignored. 

The problem comes from a large real world application during the tcl8.5 -> tcl8.6 transition. The provided patch (see attachment) determines such situations and turns shared global literals into unshared private ones. The provided test cases are trying to reproduce the behavior in a minimalistic setup, which tries to minimalize interactions with other test cases by restricting to certain names.

An alternative (but not so local) approach to address these kind of bugs would be to flag during cmd resolution whether a resolved command was derived from a custom cmd resolver, and to prevent the namespace based "global" sharing in such cases based on that flag.
User Comments: jan.nijtmans added on 2016-09-07 13:36:46:
Patch applied to core-8-6-branch and trunk. Gustaf, Thanks for all your work and feedback!

gustafn2 added on 2016-09-07 11:55:48:
All the changes are fine, good work! With these all of the nsf tests + OpenACS work fine as well. Since you have reduced the usage of TclRegisterNewCmdLiteral() to only one case, one could consider to remove this single occurrence as well (generic/tclCompExpr.c) and to drop the definition of the macro (generic/tclCompile.h).

In my patch were a few stylistic modifications, unrelated to the issue. For example, c-style checkers recommend to write expressions like "if (flags & MASK) ..." as "if ((flags & MASK)) ..." to avoid potential confusions of a human reader between a "bitwise and" and a "Boolean and" operation. I'll attach a patch for tclLiteral.c for consideration.

sorry for the delay, i was side-tracked by work...

gustafn2 added on 2016-09-05 22:03:23:

Deleted TclPanic: this was intentionally. When the code is compiled with TCL_COMPILE_DEBUG the function TclVerifyLocalLiteralTable() is called, which tries to make sure that all literals are in the global (sharing) literal pool. It is the intention of the patch to avoid sharing for literals handled by the resolvers, therefore these are not in the global pool, but there is no need to panic in such cases. Actually, the full if-clause should be removed. The change is only visible when compiled with TCL_COMPILE_DEBUG in the resolver.test.

Crash: The code had the assumption that all Tcl_Objs of type "cmdName" have a valid intrep. However, SetCmdNameFromAny() can produce a Tcl_Obj of type "cmdName" when the command is not found (last clause of the function). in the tcloo.test, there is some shimmering during compilation, where the literal is found locally with a different type and then SetCmdNameFromAny() changes it. The patch fixes the incorrect assumption.

==================================================================
--- generic/tclObj.c
+++ generic/tclObj.c
@@ -4218,11 +4218,11 @@
     register Namespace *currNsPtr;
     const char *name;
 
     if (objPtr->typePtr == &tclCmdNameType) {
 	resPtr = objPtr->internalRep.twoPtrValue.ptr1;
-	if (resPtr->cmdPtr == cmdPtr) {
+	if (resPtr != NULL && resPtr->cmdPtr == cmdPtr) {
 	    return;
 	}
     }
 
     cmdPtr->refCount++;
i'll check tomorrow, whether the cleaned-up version is still the same (just noticed changes in flag extra-flag passing).


jan.nijtmans added on 2016-09-05 15:08:11:

Well, I did some cleaning up, but when running test-cases there is a crash in oo.test (after all test-cases are run, noticed on Windows):

oo.test Test file error: child killed: segmentation violation

Another thing noticed: This panic: http://core.tcl.tk/tcl/artifact/c8a15e89b5c0f8656ab0c22db7168ddef374ee2f?ln=1169-1171 was removed by the patch, which is suspicious. I put it back, but this doesn't seem to make any difference, it looks like it isn't triggered by any test-case, so what's the reason for removing it?

Gustaf, any comments?


dkf added on 2016-09-03 08:08:51:

And many thanks to Gustaf for his contribution and Jan for putting it in an easy-to-review branch.

The real parts of the change look good. Just needs cleaning up.


dkf added on 2016-09-03 08:07:12:

The changes aren't minimal; there are places in there with commented out testing code or tinkering with code style.


jan.nijtmans added on 2016-09-02 12:47:56:
Proposed patch committed to branch bug-d4e7780ca1 for further testing. If I can't find any problem and no-one else complains, I plan to commit this to core-8-6-branch and trunk in a few days. The patch looks good to me, and it comes from a trustworthy source.

gustafn2 added on 2016-09-02 09:55:50:
We are using the second patch "resolver866-2" with tcl 8.6.6 on various servers since august 1. So far, we have no problem with this, it has apparently leaks [1] My recommendation is to include the second patch in your considerations, it fixes also the issue in the nsf regression test (somewhat a blocker for using nsf with tcl 8.6, and also for recommending tcl 8.6 for openacs, which uses nsf/xotcl2).

[1] http://openacs.org/munin/localdomain/localhost.localdomain/naviserver_openacs_memsize.html

gustafn2 added on 2016-07-25 09:34:45:
The attached patch resolver866-2 implements the alternative approach pointed out in the last paragraph of the bug-description. This alternative patch addresses the global literal sharing problem by introducing unshared literals. The first approach inserted a cmd literal to the global literal pool and used TclHideLiteral() to turn a shared literal into a private literal when necessary.

In this 2nd approach, cmd literals which are resolved via custom name
resolver are NOT shared at all (i.e. not inserted to the global per-interp literal pool) by allowing to create unshared literals by the literal infrastructure. To ease the handling of cmd resolvers, the patch introduces two new flags: CMD_VIA_RESOLVER and LITERAL_UNSHARED to make such cases more explicit and easier to maintain.

- New flag CMD_VIA_RESOLVER for "struct Command":  When this flag is
  set, the command was resolved via a custom resolver.
  
- New flag LITERAL_UNSHARED for TclRegisterNewLiteral(): When the flag
  is set the literal should not be shared. As a consequence, there
  might exist local literals, which are not part of the global
  (per-interpreter) literal pool (which entails a modification in
  TclVerifyLocalLiteralTable())

This patch addresses as well part of 3418547 (testcase resolver-1.4).
The testcases of 3418547 are a little problematic: By trying
to be as little invasive as possible the cmd resolver
change the literal "z" only when called from a proc named
"x". However, during byte-code compilations of a proc body, the
calling proc is not available, therefore Tcl_FindCommand() returns
for the same "z" different as result as in other invocation contexts. This unstable result contributes may be as well to the need for TclInvalidateCmdLiteral() in the test cases.

Results for resolver.test for cmd literal test cases when TclInvalidateCmdLiteral() is deactivated:
  
     OK: resolver-1.4, resolver-3.1a, resolver-3.1b, resolver-3.1c
     NOT OK: resolver-1.1, 1.2, 1.3, 1.5, 1.6

With this patch, all test-cases from the head version of nsf work fine, also when TclInvalidateCmdLiteral() is deactivated.

Attachments: