Tcl Source Code: View Ticket

Ticket UUID:	3418547
Title:	cmd literal sharing vs. cmd resolvers
Type:	Bug	Version:	obsolete: 8.6b2
Submitter:	foxcruiser	Created on:	2011-10-04 16:15:17
Subsystem:	45. Parsing and Eval	Assigned To:	dgp
Priority:	8	Severity:	Minor
Status:	Open	Last Modified:	2016-07-21 16:01:39
Resolution:	None	Closed By:	nobody
		Closed on:	2016-07-15 15:47:27
Description:	When testing our Tcl's per-interp resolvers, in particular for resolving commands, against 8.6b2 (and the current trunk, as of 20111004), we found an unwanted interaction between "global" CmdName obj sharing (i.e. cmd literals) and the cmd resolver infrastructure. In short: Using command resolvers can easily lead to situations in which CmdName objs are effectively shared between BC-compiled scripts (procs, [namespace eval]) for the scope of a given namespace (global and non-global NS alike) while their command references turn out obsolete. An invalidation of the CmdName objs intrep (ResolvedCmdName) is not performed. The omission of the invalidation is due to several reasons, depending on the BC compilation/execution context. For instance, for instantly compiled scripts ([namespace eval]) we hit TclSetCmdNameObj() which simply returns without touching (existing) CmdName obj intreps. A condensed example: Consider a cmd resolver which binds "z" to ::y when "z" is being resolved in the context of the proc ::x (and in this scope only): proc ::y {} { return Y }; # the resolver-specific resolution target proc ::x {} { z; # the tcl word being turned into a CmdName obj and shared as cmd literal after having been BC-compiled } This cmd resolution is performed during BC compilation of ::x. The cmd (::y) return by the cmd resolver is referenced by the intrep of a CmdName obj "z" for the NS scope of the [namespace current] during compilation ("::"). Next time, a script is compiled in this sharing context, this will work (unexpectedly?): namespace eval :: { z; # returns "Y"?! } However, when requesting a different cmd to become available under the token "z", e.g., ::ns1::z: namespace eval ::ns1 { proc z {} { return Z } namespace export z } ... by e.g. [namespace import]'ing, this won't show any effect on the shared cmd literal: namespace eval :: { namespace import ::ns1::z z; # still returns "Y", though expecting "Z" } One can alter this (mis-)behaviour at various spots, for instance, by causing the compiled script to omit cmd literal sharing for the given token "z": namespace eval :: { proc z {} { return Z } z; # yields "Z", as expected } This "works" simply because there is no cmdname obj emitted. Changing the script back to using a qualified token "::z" gives us the unwanted behavior again: namespace eval :: { proc ::z {} { return Z } z; # yields "Y", not "Z" } (This example is contained as a tcltest (plus comments) lit-1.1 by the patch attached) In our reading, cmd resolvers are capable of causing yet another form of "command shadowing" at the level of cmd literals, given the aggressive cmd literal sharing in Tcl 8.6. Command resolution through cmd resolvers in Tcl_FindCommand & friends gives birth to cmd literals which silently "import" commands from alien contexts (resolver specific) into a namespace context without "registering" the commands in a manner to make them visible to the various invalidation routines. Command epochs do not come into play either because the "resolver-imported" command might simply not be touched itself or intrep invalidation is never triggered (TclSetCmdNameObj()). NS-specific invalidation (cmdRefEpoch) simply does not apply because literals are not part of any invalidation scheme, e.g., TclResetShadowedCmdRefs, NS path invalidation etc. From within the cmd resolvers, there is no way of interfering (in our understanding). The patch attached to this bug report features the following: a) A minimal infrastructure for testing resolvers (tclTest.c): [testinterpresolver up\|down] b) A collection (resolvers.test) of characteristic tcltests which show the issue in various settings ([namespace import], [interp alias], [interp expose], [proc]). See tcltest lit-1.1 for a step-by-step explanation of a prototypical scenario c) Finally, a suggested patch around a newly introduced TclInvalidateCmdLiteral() helper. While this helper addresses the issue sufficiently as indicated by the tcltest collection, there might certainly be alternative strategies. The primary purpose of this patch is to point you into the right direction. TclInvalidateCmdLiteral() simply queries the literal tables for candidates using the Tcl_CreateLiteral API and, if found, resets the intrep after critical operations such as e.g. [namespace import]. d) Included is also a tcltest covering (the unrelated) bug #3383616 Stefan Sobernig Gustaf Neumann
User Comments:	anonymous added on 2016-07-21 16:01:39: Hi Don, Thx for looking into this (again)! a) You are right, the original failure in the NSF test suite disappeared, because we removed the root cause for NSF (namespace imports leading to the CmdLiteral vs. Resolver behavior). I re-introduced an updated variant of the original test at http://fisheye.openacs.org/browse/nsf/tests/tcl86.test?hb=true#to252 When disabling TclInvalidateCmdLiteral() in core-8-6-branch tip , this case fails and demonstrates the behavior for our resolver setup. b) ... and so do the critical tests in resolver.test for the synthetic resolver setup as part of the original patch. "I think imposing some limits on custom resolvers is preferable to that." We fully agree. By coincidence: See also today's bug report at http://core.tcl.tk/tcl/tktview?name=d4e7780ca1 However, the two bugs point two different problems: a) 3418547: Caching in the presence of (unrestricted) resolvers b) d4e7780ca1: CmdLiteral sharing in the presence of (unrestricted) resolvers How can we help? dgp added on 2016-07-16 19:40:08: That said, if I disable the routine TclInvalidateCmdLiteral in core-8-6-branch tip and then build & test nsf2.0.0 against it, I get no test failures. Perhaps it's not a tested failure mode? dgp added on 2016-07-16 19:36:25: The only difficult example I have seen appears to be the InterpColonCmdResolver in nsf 2.0.0. It seems to be resolving commands that match :* to a special "colon command" in a number of contexts that do not map neatly on the namespace concepts that the Tcl core knows how to bump epochs on. Well, the other difficult example is the one that's built into the [testinterpresolver] testing command since it change behavior on the context of being in the body of a proc with a particular name. Also not a condition the Tcl core is equipped to detect with its epochs. dgp added on 2016-07-16 16:26:27: The key to making progress here, I believe, is to establish what limitations can be imposed on custom command name resolvers. If a custom resolver is totally free to return anything at all, then it is impossible to cache the result of any resolution with the expectation that a later resolution can be skipped on the theory that we already know the answer. At a minimum a custom resolver has to be predictable and has to have its result be a function of a known set of input state. If we cannot cache cmd name lookup results, it quickly follows that we cannot do any bytecode compiling either in any context where a custom resolver has any chance to play a role. That's quite a consequence! I think imposing some limits on custom resolvers is preferable to that. dgp added on 2016-07-15 15:55:33: Started new branch bug-3418547 to try tackling this again. dgp added on 2016-07-15 15:47:27: Fate brings me here again. This bug report was about a real and difficult problem with a still under-specified extension mechanism, and I believe the remedy we've been living with was offered up in good faith and has been effective enough for people to move forward. But it's just wrong. My suspicions are raised because the TclInvalidateCmdLiteral() routine that implements the corrective only patches up literals that have invalid intreps. What about other Tcl_Obj values that are holding "cmdName" intreps? How are they to be corrected? Looking at test resolver-1.1 I see more trouble. The testing resolver that is enabled by [testinterpresolver up] has a very specific function. If it sees the command name "z" within the proc "::x" and nowhere else it will redirect that to the implementation of command "::y". So as described in the test comments, set r0 [x] produces result "Y" because of the redirection. In step 2, set r1 [z] The redirection has been cached as the intrep of [z] so it is taken again, even though it is not in the context recognized by the resolver. The comments question, but accept this. They should not. This is just wrong. The resolver conditions are not met. The correct result should be "unknown command 'z'". The a [namespace import] is done to give [z] a more correct target. Before the bug fix, this failed. Now it succeeds because each thing like [namespace import] that could offer a new target also seeks out and clears any "cmdName" intreps in literals, and in this case the improper behavior is in a literal. We now get set r2 [z] which returns "Z" as we expect. At this point the test stops before taking the next step to see that things are still broken: set r3 [x] This takes us back to the first case, where unquestionably the resolver ought to be in power, but if you try it you will see that the returned value is "Z". Here the caching is working against us again, but in the opposite way. The context distinctions that the interp resolver is making use of are not known and not tested in validating the intrep. In general, it's hard to see how they could be. dkf added on 2011-10-20 21:42:19: allow_comments - 1 OK, now applied to trunk. dgp added on 2011-10-20 03:21:46: fix for segfault committed to branch dgp added on 2011-10-20 03:11:17: Trimmed down demo script: proc x {} {set T1 100} testinterpresolver up x testinterpresolver down valgrind reports: ==15338== Invalid read of size 4 ==15338== at 0x421266: HashVarFree (tclTest.c:7193) ==15338== by 0x4212F0: MyCompiledVarFree (tclTest.c:7208) ==15338== by 0x51E31B: TclProcCleanupProc (tclProc.c:2216) ==15338== by 0x51E273: TclProcDeleteProc (tclProc.c:2174) ==15338== by 0x42DACA: Tcl_DeleteCommandFromToken (tclBasic.c:3088) ==15338== by 0x502476: TclTeardownNamespace (tclNamesp.c:1101) ==15338== by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433) ==15338== by 0x51B048: Tcl_Release (tclPreserve.c:229) ==15338== by 0x5013E6: FreeMainInterp (tclMain.c:938) ==15338== by 0x4B9A55: InvokeExitHandlers (tclEvent.c:910) ==15338== by 0x4B9BA3: Tcl_Finalize (tclEvent.c:1097) ==15338== by 0x4B9AF9: Tcl_Exit (tclEvent.c:963) ==15338== Address 0x4d41ca0 is 16 bytes inside a block of size 64 free'd ==15338== at 0x4A05D21: free (vg_replace_malloc.c:325) ==15338== by 0x584C25: TclpFree (tclAlloc.c:728) ==15338== by 0x545646: FreeVarEntry (tclVar.c:6386) ==15338== by 0x4D5039: Tcl_DeleteHashEntry (tclHash.c:463) ==15338== by 0x543B1F: TclDeleteNamespaceVars (tclVar.c:5277) ==15338== by 0x50242F: TclTeardownNamespace (tclNamesp.c:1086) ==15338== by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433) ==15338== by 0x51B048: Tcl_Release (tclPreserve.c:229) ==15338== by 0x5013E6: FreeMainInterp (tclMain.c:938) ==15338== by 0x4B9A55: InvokeExitHandlers (tclEvent.c:910) ==15338== by 0x4B9BA3: Tcl_Finalize (tclEvent.c:1097) ==15338== by 0x4B9AF9: Tcl_Exit (tclEvent.c:963) ==15338== ==15338== Invalid free() / delete / delete[] ==15338== at 0x4A05D21: free (vg_replace_malloc.c:325) ==15338== by 0x584C25: TclpFree (tclAlloc.c:728) ==15338== by 0x43B4C8: Tcl_Free (tclCkalloc.c:1209) ==15338== by 0x421281: HashVarFree (tclTest.c:7194) ==15338== by 0x4212F0: MyCompiledVarFree (tclTest.c:7208) ==15338== by 0x51E31B: TclProcCleanupProc (tclProc.c:2216) ==15338== by 0x51E273: TclProcDeleteProc (tclProc.c:2174) ==15338== by 0x42DACA: Tcl_DeleteCommandFromToken (tclBasic.c:3088) ==15338== by 0x502476: TclTeardownNamespace (tclNamesp.c:1101) ==15338== by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433) ==15338== by 0x51B048: Tcl_Release (tclPreserve.c:229) ==15338== by 0x5013E6: FreeMainInterp (tclMain.c:938) ==15338== Address 0x4d41c90 is 0 bytes inside a block of size 64 free'd ==15338== at 0x4A05D21: free (vg_replace_malloc.c:325) ==15338== by 0x584C25: TclpFree (tclAlloc.c:728) ==15338== by 0x545646: FreeVarEntry (tclVar.c:6386) ==15338== by 0x4D5039: Tcl_DeleteHashEntry (tclHash.c:463) ==15338== by 0x543B1F: TclDeleteNamespaceVars (tclVar.c:5277) ==15338== by 0x50242F: TclTeardownNamespace (tclNamesp.c:1086) ==15338== by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433) ==15338== by 0x51B048: Tcl_Release (tclPreserve.c:229) ==15338== by 0x5013E6: FreeMainInterp (tclMain.c:938) ==15338== by 0x4B9A55: InvokeExitHandlers (tclEvent.c:910) ==15338== by 0x4B9BA3: Tcl_Finalize (tclEvent.c:1097) ==15338== by 0x4B9AF9: Tcl_Exit (tclEvent.c:963) dgp added on 2011-10-20 02:49:56: test resolver-2.1 is the source. dgp added on 2011-10-20 02:41:30: The corrupted hash table is the varTable of a Namespace. dgp added on 2011-10-20 02:39:07: At the time of the crash, (gdb) print tablePtr->buckets[5]->nextPtr $8 = (Tcl_HashEntry ) 0x200000013 which isn't a pointer to valid memory. Something's gone wrong by that point. dgp added on 2011-10-20 02:22:19: Program received signal SIGSEGV, Segmentation fault. 0x00000000004e0067 in CreateHashEntry (tablePtr=0x7ea360, key=0x8db7f8 "\001", newPtr=0x0) at /home/dgp/fossil/tcl-only/generic/tclHash.c:323 323 if (hash != PTR2UINT(hPtr->hash)) { (gdb) bt #0 0x00000000004e0067 in CreateHashEntry (tablePtr=0x7ea360, key=0x8db7f8 "\001", newPtr=0x0) at /home/dgp/fossil/tcl-only/generic/tclHash.c:323 #1 0x0000000000549185 in VarHashCreateVar (tablePtr=0x7ea360, key=0x8db7f8, newPtr=0x0) at /home/dgp/fossil/tcl-only/generic/tclVar.c:55 #2 0x000000000055108e in ObjFindNamespaceVar (interp=0x7e7ea8, namePtr=0x7e9ec8, contextNsPtr=0x7ea268, flags=262145) at /home/dgp/fossil/tcl-only/generic/tclVar.c:5889 #3 0x0000000000549fae in TclLookupSimpleVar (interp=0x7e7ea8, varNamePtr=0x7e9ec8, flags=1, create=1, errMsgPtr=0x7fffffffc4a8, indexPtr=0x7fffffffc4b4) at /home/dgp/fossil/tcl-only/generic/tclVar.c:966 #4 0x0000000000549a23 in TclObjLookupVarEx (interp=0x7e7ea8, part1Ptr=0x7e9ec8, part2Ptr=0x0, flags=1, msg=0x5b080c "set", createPart1=1, createPart2=1, arrayPtrPtr=0x7fffffffc560) at /home/dgp/fossil/tcl-only/generic/tclVar.c:707 #5 0x000000000054b153 in Tcl_ObjSetVar2 (interp=0x7e7ea8, part1Ptr=0x7e9ec8, part2Ptr=0x0, newValuePtr=0x9401f8, flags=1) at /home/dgp/fossil/tcl-only/generic/tclVar.c:1768 #6 0x000000000052eaea in Tcl_ResetResult (interp=0x7e7ea8) at /home/dgp/fossil/tcl-only/generic/tclResult.c:928 #7 0x00000000004ceb69 in TEBCresume (data=0x948a20, interp=0x7e7ea8, result=1) at /home/dgp/fossil/tcl-only/generic/tclExecute.c:5574 dkf added on 2011-10-15 23:45:29: Committed your patch (with small tidy up) to the bug-3418547 branch but not to the trunk because of a crash when I add ::tcltest::cleanupTests to the end of the script (as part of the standard pattern of tests). Can't merge until that's sorted out. dkf added on 2011-10-14 20:50:28: I don't feel very wise today, but I'm tempted to start with adding in the machinery for testing resolvers. That's no problem. I'll probably also take the tests without much more thought; after all, tests are usually a good thing. As for the actual fixes, it'll take a bit for me to review them. As I said, I don't feel very wise today. (Too much work earlier on!) msofer added on 2011-10-12 18:31:02: On the chat with Stephan: [08:22]migueljust by the comments in the code: " not being registered with the namespace's command table" [08:22]<mr_calvin>(and played around with some xotcl BC injections) [08:23]miguelinitial reaction: would that* be a better solution? I mean, just making it so that a command resolver works like an import? [08:23]<mr_calvin>the comment is meant to be an instructive comparison: i know they shouldn't be registered there, but I just wanted to make a comparative point [08:24]<mr_calvin>well [08:27]<mr_calvin>from the top of my mind, treating cmd resolvers as [namespace import] would fit the intention of our use cases (actually, like itcl, we provide some commands via resolvers to avoid explicit [namespace imports] into the global ns) [08:27]miguelknow what? I'll reassign to dkf (but still take a good look). To make sure that we do not get in anybody else's way, TclOO and itcl are the only other users I know of foxcruiser added on 2011-10-04 23:15:23: File Added - 425230: cmdResVsCmdLiterals.patch

Attachments:

cmdResVsCmdLiterals.patch [download] added by foxcruiser on 2011-10-04 23:15:23. [details]