Tcl Source Code

View Ticket
Login
Ticket UUID: 3418547
Title: cmd literal sharing vs. cmd resolvers
Type: Bug Version: obsolete: 8.6b2
Submitter: foxcruiser Created on: 2011-10-04 16:15:17
Subsystem: 45. Parsing and Eval Assigned To: dgp
Priority: 8 Severity: Minor
Status: Open Last Modified: 2016-07-21 16:01:39
Resolution: None Closed By: nobody
    Closed on: 2016-07-15 15:47:27
Description:
When testing our Tcl's per-interp resolvers, in particular for
resolving commands, against 8.6b2 (and the current
trunk, as of 20111004), we found an unwanted interaction between
"global" CmdName obj sharing (i.e. cmd literals) and the cmd resolver
infrastructure.

In short: Using command resolvers can easily lead to situations in
which CmdName objs are effectively shared between BC-compiled scripts
(procs, [namespace eval]) for the scope of a given namespace (global
and non-global NS alike) while their command references turn out
obsolete. An invalidation of the CmdName objs intrep (ResolvedCmdName)
is not performed. The omission of the invalidation is due to several
reasons, depending on the BC compilation/execution context. For
instance, for instantly compiled scripts ([namespace eval]) we hit
TclSetCmdNameObj() which simply returns without touching (existing) CmdName
obj intreps.

A condensed example:

Consider a cmd resolver which binds "z" to ::y when "z" is being
resolved in the context of the proc ::x (and in this scope only):
 
proc ::y {} { return Y }; # the resolver-specific resolution target
  
proc ::x {} {
    z; # the tcl word being turned into a CmdName obj and shared as cmd literal after having been BC-compiled
  }

This cmd resolution is performed during BC compilation of ::x. The cmd
(::y) return by the cmd resolver is referenced by the intrep of a
CmdName obj "z" for the NS scope of the [namespace current] during
compilation ("::"). Next time, a script is compiled in this sharing
context, this will work (unexpectedly?):

namespace eval :: {
    z; # returns "Y"?!
}

However, when requesting a different cmd to become available under the
token "z", e.g., ::ns1::z:
  
namespace eval ::ns1 {
    proc z {} { return Z }
    namespace export z
  }
  
... by e.g. [namespace import]'ing, this won't show any effect on the shared cmd literal:

 namespace eval :: {
    namespace import ::ns1::z
    z; # still returns "Y", though expecting "Z"
  }
 
One can alter this (mis-)behaviour at various spots, for instance, by
causing the compiled script to omit cmd literal sharing for the given
token "z":

 namespace eval :: {
    proc z {} { return Z }
    z; # yields "Z", as expected
  }

This "works" simply because there is no cmdname obj emitted. Changing
the script back to using a qualified token "::z" gives us the unwanted
behavior again:

 namespace eval :: {
    proc ::z {} { return Z }
    z; # yields "Y", not "Z"
  }

(This example is contained as a tcltest (plus comments) lit-1.1 by the patch attached)

In our reading, cmd resolvers are capable of causing yet another form
of "command shadowing" at the level of cmd literals, given the
aggressive cmd literal sharing in Tcl 8.6. Command resolution through
cmd resolvers in Tcl_FindCommand & friends gives birth to cmd literals
which silently "import" commands from alien contexts (resolver
specific) into a namespace context without "registering" the commands
in a manner to make them visible to the various invalidation
routines. Command epochs do not come into play either because the
"resolver-imported" command might simply not be touched itself or
intrep invalidation is never triggered
(TclSetCmdNameObj()). NS-specific invalidation (cmdRefEpoch) simply
does not apply because literals are not part of any invalidation
scheme, e.g., TclResetShadowedCmdRefs, NS path invalidation etc. From
within the cmd resolvers, there is no way of interfering (in our
understanding).

The patch attached to this bug report features the following:

a) A minimal infrastructure for testing resolvers (tclTest.c):
[testinterpresolver up|down]

b) A collection (resolvers.test) of characteristic tcltests which show
the issue in various settings ([namespace import], [interp alias],
[interp expose], [proc]). See tcltest lit-1.1 for a step-by-step
explanation of a prototypical scenario

c) Finally, a suggested patch around a newly introduced
TclInvalidateCmdLiteral() helper. While this helper addresses the
issue sufficiently as indicated by the tcltest collection, there might certainly
be alternative strategies. The primary purpose of this patch is to
point you into the right direction. TclInvalidateCmdLiteral() simply
queries the literal tables for candidates using the Tcl_CreateLiteral
API and, if found, resets the intrep after critical operations such
as e.g. [namespace import].

d) Included is also a tcltest covering (the unrelated) bug #3383616

Stefan Sobernig
Gustaf Neumann
User Comments: anonymous added on 2016-07-21 16:01:39:
Hi Don,

Thx for looking into this (again)!

a) You are right, the original failure in the NSF test suite disappeared, because we removed the root cause for NSF (namespace imports leading to the CmdLiteral vs. Resolver behavior). I re-introduced an updated variant of the original test at http://fisheye.openacs.org/browse/nsf/tests/tcl86.test?hb=true#to252

When disabling TclInvalidateCmdLiteral() in core-8-6-branch tip , this case fails and demonstrates the behavior for our resolver setup.

b) ... and so do the critical tests in resolver.test for the synthetic resolver setup as part of the original patch.

"I think imposing some limits on custom resolvers is preferable to that."

We fully agree. By coincidence: See also today's bug report at http://core.tcl.tk/tcl/tktview?name=d4e7780ca1

However, the two bugs point two different problems:

a) 3418547: Caching in the presence of (unrestricted) resolvers
b) d4e7780ca1: CmdLiteral sharing in the presence of (unrestricted) resolvers

How can we help?

dgp added on 2016-07-16 19:40:08:
That said, if I disable the routine TclInvalidateCmdLiteral
in core-8-6-branch tip and then build & test nsf2.0.0
against it, I get no test failures.   Perhaps it's not a tested
failure mode?

dgp added on 2016-07-16 19:36:25:
The only difficult example I have seen appears to be the
InterpColonCmdResolver in nsf 2.0.0.  It seems to be
resolving commands that match :* to a special
"colon command" in a number of contexts that do
not map neatly on the namespace concepts that the
Tcl core knows how to bump epochs on.

Well, the other difficult example is the one that's
built into the [testinterpresolver] testing command
since it change behavior on the context of being
in the body of a proc with a particular name.
Also not a condition the Tcl core is equipped to
detect with its epochs.

dgp added on 2016-07-16 16:26:27:
The key to making progress here, I believe, is to establish
what limitations can be imposed on custom command name
resolvers.  If a custom resolver is totally free to return anything
at all, then it is impossible to cache the result of any resolution
with the expectation that a later resolution can be skipped on
the theory that we already know the answer.  At a minimum
a custom resolver has to be predictable and has to have its
result be a function of a known set of input state.

If we cannot cache cmd name lookup results, it quickly follows
that we cannot do any bytecode compiling either in any context
where a custom resolver has any chance to play a role.  That's
quite a consequence!  I think imposing some limits on custom
resolvers is preferable to that.

dgp added on 2016-07-15 15:55:33:
Started new branch bug-3418547 to try tackling this
again.

dgp added on 2016-07-15 15:47:27:
Fate brings me here again.

This bug report was about a real and difficult problem
with a still under-specified extension mechanism, and
I believe the remedy we've been living with was offered
up in good faith and has been effective enough for
people to move forward.

But it's just wrong.

My suspicions are raised because the TclInvalidateCmdLiteral()
routine that implements the corrective *only* patches up
literals that have invalid intreps.  What about other
Tcl_Obj values that are holding "cmdName" intreps?  How
are they to be corrected?

Looking at test resolver-1.1 I see more trouble.  The
testing resolver that is enabled by [testinterpresolver up]
has a very specific function.  If it sees the command name
"z" within the proc "::x" and nowhere else it will redirect
that to the implementation of command "::y".  So as described
in the test comments,

    set r0 [x]

produces result "Y" because of the redirection.  In step 2,

    set r1 [z]

The redirection has been cached as the intrep of [z] so it
is taken again, even though it is not in the context recognized
by the resolver.  The comments question, but accept this.
They should not.  This is just wrong.  The resolver conditions
are not met.  The correct result should be "unknown command 'z'".

The a [namespace import] is done to give [z] a more correct target.
Before the bug fix, this failed.  Now it succeeds because each thing
like [namespace import] that could offer a new target also seeks out
and clears any "cmdName" intreps in literals, and in this case the
improper behavior is in a literal.  We now get

    set r2 [z]

which returns "Z" as we expect.  At this point the test stops before
taking the next step to see that things are still broken:

    set r3 [x]

This takes us back to the first case, where unquestionably the
resolver ought to be in power, but if you try it you will see
that the returned value is "Z".  Here the caching is working against
us again, but in the opposite way.  The context distinctions that
the interp resolver is making use of are not known and not tested
in validating the intrep.  In general, it's hard to see how they
could be.

dkf added on 2011-10-20 21:42:19:

allow_comments - 1

OK, now applied to trunk.

dgp added on 2011-10-20 03:21:46:
fix for segfault committed to branch

dgp added on 2011-10-20 03:11:17:
Trimmed down demo script:

proc x {} {set T1 100}
testinterpresolver up
x
testinterpresolver down

valgrind reports:

==15338== Invalid read of size 4
==15338==    at 0x421266: HashVarFree (tclTest.c:7193)
==15338==    by 0x4212F0: MyCompiledVarFree (tclTest.c:7208)
==15338==    by 0x51E31B: TclProcCleanupProc (tclProc.c:2216)
==15338==    by 0x51E273: TclProcDeleteProc (tclProc.c:2174)
==15338==    by 0x42DACA: Tcl_DeleteCommandFromToken (tclBasic.c:3088)
==15338==    by 0x502476: TclTeardownNamespace (tclNamesp.c:1101)
==15338==    by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433)
==15338==    by 0x51B048: Tcl_Release (tclPreserve.c:229)
==15338==    by 0x5013E6: FreeMainInterp (tclMain.c:938)
==15338==    by 0x4B9A55: InvokeExitHandlers (tclEvent.c:910)
==15338==    by 0x4B9BA3: Tcl_Finalize (tclEvent.c:1097)
==15338==    by 0x4B9AF9: Tcl_Exit (tclEvent.c:963)
==15338==  Address 0x4d41ca0 is 16 bytes inside a block of size 64 free'd
==15338==    at 0x4A05D21: free (vg_replace_malloc.c:325)
==15338==    by 0x584C25: TclpFree (tclAlloc.c:728)
==15338==    by 0x545646: FreeVarEntry (tclVar.c:6386)
==15338==    by 0x4D5039: Tcl_DeleteHashEntry (tclHash.c:463)
==15338==    by 0x543B1F: TclDeleteNamespaceVars (tclVar.c:5277)
==15338==    by 0x50242F: TclTeardownNamespace (tclNamesp.c:1086)
==15338==    by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433)
==15338==    by 0x51B048: Tcl_Release (tclPreserve.c:229)
==15338==    by 0x5013E6: FreeMainInterp (tclMain.c:938)
==15338==    by 0x4B9A55: InvokeExitHandlers (tclEvent.c:910)
==15338==    by 0x4B9BA3: Tcl_Finalize (tclEvent.c:1097)
==15338==    by 0x4B9AF9: Tcl_Exit (tclEvent.c:963)
==15338==
==15338== Invalid free() / delete / delete[]
==15338==    at 0x4A05D21: free (vg_replace_malloc.c:325)
==15338==    by 0x584C25: TclpFree (tclAlloc.c:728)
==15338==    by 0x43B4C8: Tcl_Free (tclCkalloc.c:1209)
==15338==    by 0x421281: HashVarFree (tclTest.c:7194)
==15338==    by 0x4212F0: MyCompiledVarFree (tclTest.c:7208)
==15338==    by 0x51E31B: TclProcCleanupProc (tclProc.c:2216)
==15338==    by 0x51E273: TclProcDeleteProc (tclProc.c:2174)
==15338==    by 0x42DACA: Tcl_DeleteCommandFromToken (tclBasic.c:3088)
==15338==    by 0x502476: TclTeardownNamespace (tclNamesp.c:1101)
==15338==    by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433)
==15338==    by 0x51B048: Tcl_Release (tclPreserve.c:229)
==15338==    by 0x5013E6: FreeMainInterp (tclMain.c:938)
==15338==  Address 0x4d41c90 is 0 bytes inside a block of size 64 free'd
==15338==    at 0x4A05D21: free (vg_replace_malloc.c:325)
==15338==    by 0x584C25: TclpFree (tclAlloc.c:728)
==15338==    by 0x545646: FreeVarEntry (tclVar.c:6386)
==15338==    by 0x4D5039: Tcl_DeleteHashEntry (tclHash.c:463)
==15338==    by 0x543B1F: TclDeleteNamespaceVars (tclVar.c:5277)
==15338==    by 0x50242F: TclTeardownNamespace (tclNamesp.c:1086)
==15338==    by 0x42B8D8: DeleteInterpProc (tclBasic.c:1433)
==15338==    by 0x51B048: Tcl_Release (tclPreserve.c:229)
==15338==    by 0x5013E6: FreeMainInterp (tclMain.c:938)
==15338==    by 0x4B9A55: InvokeExitHandlers (tclEvent.c:910)
==15338==    by 0x4B9BA3: Tcl_Finalize (tclEvent.c:1097)
==15338==    by 0x4B9AF9: Tcl_Exit (tclEvent.c:963)

dgp added on 2011-10-20 02:49:56:
test resolver-2.1 is the source.

dgp added on 2011-10-20 02:41:30:
The corrupted hash table is the
varTable of a Namespace.

dgp added on 2011-10-20 02:39:07:
At the time of the crash,

(gdb) print tablePtr->buckets[5]->nextPtr
$8 = (Tcl_HashEntry *) 0x200000013

which isn't a pointer to valid memory.  Something's
gone wrong by that point.

dgp added on 2011-10-20 02:22:19:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004e0067 in CreateHashEntry (tablePtr=0x7ea360, key=0x8db7f8 "\001",
    newPtr=0x0) at /home/dgp/fossil/tcl-only/generic/tclHash.c:323
323                 if (hash != PTR2UINT(hPtr->hash)) {
(gdb) bt
#0  0x00000000004e0067 in CreateHashEntry (tablePtr=0x7ea360,
    key=0x8db7f8 "\001", newPtr=0x0)
    at /home/dgp/fossil/tcl-only/generic/tclHash.c:323
#1  0x0000000000549185 in VarHashCreateVar (tablePtr=0x7ea360, key=0x8db7f8,
    newPtr=0x0) at /home/dgp/fossil/tcl-only/generic/tclVar.c:55
#2  0x000000000055108e in ObjFindNamespaceVar (interp=0x7e7ea8,
    namePtr=0x7e9ec8, contextNsPtr=0x7ea268, flags=262145)
    at /home/dgp/fossil/tcl-only/generic/tclVar.c:5889
#3  0x0000000000549fae in TclLookupSimpleVar (interp=0x7e7ea8,
    varNamePtr=0x7e9ec8, flags=1, create=1, errMsgPtr=0x7fffffffc4a8,
    indexPtr=0x7fffffffc4b4) at /home/dgp/fossil/tcl-only/generic/tclVar.c:966
#4  0x0000000000549a23 in TclObjLookupVarEx (interp=0x7e7ea8,
    part1Ptr=0x7e9ec8, part2Ptr=0x0, flags=1, msg=0x5b080c "set",
    createPart1=1, createPart2=1, arrayPtrPtr=0x7fffffffc560)
    at /home/dgp/fossil/tcl-only/generic/tclVar.c:707
#5  0x000000000054b153 in Tcl_ObjSetVar2 (interp=0x7e7ea8, part1Ptr=0x7e9ec8,
    part2Ptr=0x0, newValuePtr=0x9401f8, flags=1)
    at /home/dgp/fossil/tcl-only/generic/tclVar.c:1768
#6  0x000000000052eaea in Tcl_ResetResult (interp=0x7e7ea8)
    at /home/dgp/fossil/tcl-only/generic/tclResult.c:928
#7  0x00000000004ceb69 in TEBCresume (data=0x948a20, interp=0x7e7ea8, result=1)
    at /home/dgp/fossil/tcl-only/generic/tclExecute.c:5574

dkf added on 2011-10-15 23:45:29:
Committed your patch (with small tidy up) to the bug-3418547 branch but not to the trunk because of a crash when I add ::tcltest::cleanupTests to the end of the script (as part of the standard pattern of tests). Can't merge until that's sorted out.

dkf added on 2011-10-14 20:50:28:
I don't feel very wise today, but I'm tempted to start with adding in the machinery for testing resolvers. That's no problem. I'll probably also take the tests without much more thought; after all, tests are usually a good thing.

As for the actual fixes, it'll take a bit for me to review them. As I said, I don't feel very wise today. (Too much work earlier on!)

msofer added on 2011-10-12 18:31:02:
On the chat with Stephan:
[08:22]migueljust by the comments in the code: " not being registered with the namespace's command table"
[08:22]<mr_calvin>(and played around with some xotcl BC injections)
[08:23]miguelinitial reaction: would *that* be a better solution? I mean, just making it so that a command resolver works like an import?
[08:23]<mr_calvin>the comment is meant to be an instructive comparison: i know they shouldn't be registered there, but I just wanted to make a comparative point
[08:24]<mr_calvin>well 
[08:27]<mr_calvin>from the top of my mind, treating cmd resolvers as [namespace import] would fit the intention of our use cases (actually, like itcl, we provide some commands via resolvers to avoid explicit [namespace imports] into the global ns)
[08:27]miguelknow what? I'll reassign to dkf (but still take a good look). To make sure that we do not get in anybody else's way, TclOO and itcl are the only other users I know of

foxcruiser added on 2011-10-04 23:15:23:

File Added - 425230: cmdResVsCmdLiterals.patch

Attachments: