Ticket UUID: | 1119369 | |||
Title: | segfault in Tcl_EvalObjv (from Tcl_EvalObjEx call) | |||
Type: | Bug | Version: | obsolete: 8.4.9 | |
Submitter: | pcmacdon | Created on: | 2005-02-09 15:52:04 | |
Subsystem: | 10. Objects | Assigned To: | msofer | |
Priority: | 4 | Severity: | ||
Status: | Closed | Last Modified: | 2005-04-16 23:53:14 | |
Resolution: | Fixed | Closed By: | msofer | |
Closed on: | 2005-04-16 16:53:14 | |||
Description: |
There is a problem with the internal handling of Tcl command objects that exposes it to a segmentation fault. The root of the problem seems to be that descendants of Tcl_EvalObjEx() makes an implicit assumption that the list elements of objPtr will not be freed. But TclCompEvalObj() can and does free them to convert objPtr into bytecode. The following script should reproduce the problem (under Linux at least). ######################################## # START OF FILE # Save as a file and execute "tclsh FILE" to see crash if {![info exists SRC]} { set SRC [list source $argv0] uplevel #0 $SRC } else { namespace eval x { if {[info exists ::DONE]} { error "DUPLICATE SOURCE" } set ::DONE 1 namespace eval :: $SRC } } # END OF FILE ##################################### I've found two specific changes that eliminate the segfault problem for me (although this is by no means a comprehensive solution). The first is in Tcl_FSEvalFile() where the following block is moved after the error processing (to just before the "end:" label). if (iPtr->scriptFile != NULL) { Tcl_DecrRefCount(iPtr->scriptFile); } iPtr->scriptFile = oldScriptFile; The second is a change to the Tcl_EvalObjv call in Tcl_EvalObjEx(). There are two possibilities here. Option 1 is to just add TCL_EVAL_INVOKE to the flag in the call. ie: result = Tcl_EvalObjv(interp, listRepPtr->elemCount, listRepPtr->elements, flags | TCL_EVAL_INVOKE); If that is undesirable, then option 2 is a new TclEvalObjvList() function which takes an additional argument (the listRepPtr) which can be checked to see if the representation is no longer a list, before error processing. Tcl_EvalObjEx() can use this new function. eg. static int TclEvalObjvList(interp, objc, objv, flags, objPtr) // Old body of Tcl_EvalObjv here... if ((code == TCL_ERROR) && !(flags & TCL_EVAL_INVOKE)) { /* * If there was an error, a command string will be needed for the * error log: generate it now if it was not done previously. */ if (cmdLen == 0 && ((!objPtr) || (objPtr->typePtr == &tclListType))) { //... } int Tcl_EvalObjv(interp, objc, objv, flags) ... { return TclEvalObjvList(interp, objc, objv, flags, NULL); } | |||
User Comments: |
msofer added on 2005-04-16 23:53:14:
Logged In: YES user_id=148712 Refcounting the list intRep is now in head (patch 1158008) msofer added on 2005-02-25 01:57:06: Logged In: YES user_id=148712 Oops - I forgot to update this ticket. The bug is fixed, the patch has been committed (and a backport to core-8-4-branch) on 2004-02-10, including tests. The ticket remains open as a reminder to explore the (IMO better) solution of refcounting the List internal rep. pcmacdon added on 2005-02-21 06:54:39: Logged In: YES user_id=190660 One final observation. Dumping the objc in all calls to Tcl_EvalObjv in a largish Tcl app, I note that < 5% have an objc==2 and so use the list optimization, and are therefore susceptible to the problem. Autoload calls I think account for the bulk of these (leading to source IO which stalls the app anyways). The other 95% of calls are the more typical usage of eval, ie. objc>=3 eval $cmd $arg which are unoptimized and involve the overhead of Tcl_ConcatObj. pcmacdon added on 2005-02-13 01:45:04: Logged In: YES user_id=190660 What I'm proposing here covers the case in Tcl_EvalObjCmd where objc>2 or objPtr=Tcl_Concat(...), etc. In this case, the internal representation can not change because Tcl_EvalObjEx does not pass objPtr on to any other function. It merely grabs the objv from objPtr->twoPtrValue.ptr1 And the objPtr will be destroyed automatically by the Tcl_DecrRefCount before returning. ie. Scenario: On entering Tcl_EvalObjEx (flag=TCL_EVAL_DIRECT), objPtr->refCount == 0 Assumption: objPtr is a private object with no other users except the caller. msofer added on 2005-02-13 00:17:26: Logged In: YES user_id=148712 No, that will not do. In principle, objPtr cannot become invalid under the feet of Tcl_EvalObjEx - that is Tcl_EvalObjEx's caller responsibility. What we were witnessing here is that the Tcl_Obj *objPtr loses its list internal rep: objPtr is still pointing to the same Tcl_Obj, but that Tcl_Obj is not of tclListType anymore. IOW, objPtr is ok but objPtr->twoPtrValue.ptr1 does not point to a List struct anymore. The segfault is not from an objPtr deref, it comes from the listRepPtr deref. pcmacdon added on 2005-02-12 23:30:39: Logged In: YES user_id=190660 The following is small optimization for the tclStackAlloc patch to exclude valid calls. if (objPtr->refCount<=1) { result = Tcl_EvalObjv(interp, listRepPtr->elemCount, listRepPtr->elements, flags); } else { /* Do tclStackAlloc of objv */ } pcmacdon added on 2005-02-12 00:43:57: Logged In: YES user_id=190660 One question that leaves me with is: Shouldn't namespace and interp eval do the same thing? If not, I would have thought that uplevel and eval would do something like: objPtr = Tcl_DuplicateObj(objv[N]); result = Tcl_EvalObjEx(interp, objPtr, TCL_EVAL_DIRECT); msofer added on 2005-02-11 23:40:43: Logged In: YES user_id=148712 This is the "pure list optimisation": if a script is a "pure list" (list internal rep and no string rep => guaranteed to parse to the same list rep), by executing it directly you avoid: 1. building a string rep of the script 2. parsing the script into a sequence of commands (there's only one!) 3. parsing each command into words (we had them already); note that each word would then be a fresh Tcl_Obj - ie, no internal rep. The pure list opt avoids these steps, and can also profit from reusing cached internal reps. % proc foo x {set x} % set scr [list foo 1]; time {eval $scr} 100000 9 microseconds per iteration % string length $scr; time {eval $scr} 100000 17 microseconds per iteration pcmacdon added on 2005-02-11 23:24:54: Logged In: YES user_id=190660 I had it backwards. The problem triggers because of the form: Tcl_EvalObjEx(interp, objv[N], TCL_EVAL_DIRECT); Which only Tcl_UplevelObjCmd and Tcl_EvalObjCmd are using. In general, shouldn't the only valid use of TCL_EVAL_DIRECT be: objPtr = Tcl_ConcatObj(objc-N, objv+N); result = Tcl_EvalObjEx(interp, objPtr, TCL_EVAL_DIRECT); ie. a temporary object that will never be available to the script user. Why would eval and uplevel not want to shimmer the object into bytecode, anyways? msofer added on 2005-02-11 18:18:14: Logged In: YES user_id=148712 The root of the problem is: TEOE is using (in the pure-list branch) the list internal rep of objPtr as the objv argument to TEOV; if objPtr happens to shimmer away to another internal rep before TEOV returns, objv is freed. If TEOV (or one of its callees!) tries to access objv after that, we get a smash. The correct fix has to be one of: do not use the pure-list branch, or else insure that objv remains valid even if the list shimmers away (as the patch does). Setting TCL_EVAL_DIRECT in the [ns eval] code just avoids the shimmering in this particular case, but does not fix the root problem. In particular, the version included in the new test would still bomb: set SRC [list foo 1] ;# pure-list command proc foo str { # Shimmer pure-list to cmdName and error proc $::SRC {} {}; $::SRC error "BAD CALL" } eval $SRC In you scenario: 1. the shimmering was caused by compilation in [namespace eval ::], which you can indeed avoid by making it use TCL_EVAL_DIRECT 2. the objv access was happening in error handling in TEOV (in trace handlers, for your second example) pcmacdon added on 2005-02-11 10:12:48: Logged In: YES user_id=190660 Maybe there's a simpler solution. As I mentioned, namespace eval seems to be the trigger for the crash. Comparing Tcl_UplevelObjCmd and NamespaceEvalCmd, I note that uplevel uses the TCL_EVAL_DIRECT flag in calls to Tcl_EvalObjEx for all values of objc, but namespace eval uses it only for objc>4. Otherwise, the code is identical. Changing namespace eval to use the TCL_EVAL_DIRECT flag resolves my crash. Is there a good reason for this seeming inconsistency? I've run a very large Tcl application with it and notice no appreciable difference in execution time, either way. Anyways, good luck with it, whatever you decide. BTW: This bug was particularly vexing because it was being triggered, not by some exotic code, but rather by simply using an autoloaded command where a recursive error occurs in 8.4.+. msofer added on 2005-02-10 08:21:24: Logged In: YES user_id=148712 Yes, it is related. The solution implemented in the current 730244 fix is similar to the TclStackAlloc patch (with difft mem management). IMO, the better fix for both involves new refcounts: the list intrep here, the Alias struct there (as proposed by dkf). dgp added on 2005-02-10 07:56:23: Logged In: YES user_id=80530 Happy to see others are taking this on. I've just skimmed the original report, and I wonder if this is related to Bug 730244. pcmacdon added on 2005-02-10 07:12:47: Logged In: YES user_id=190660 I understand that both the refcount and the TclStackAlloc solutions have advantages. But both I think impose more overhead than the valid check approach. Plus, I started implementing it before either of those were suggested. So I just finished it off. msofer added on 2005-02-10 06:56:50: Logged In: YES user_id=148712 Peter: you are trying to recover from something that should not happen (objv becoming invalid). It is very difficult to insure that it will not have other effects somewhere different from what has already been identified. I think it is preferable to prevent it from happening: whoever passes an objv to a Tcl function (in this case, to TOEV) should insure that it remains valid until its callee returns. My patch does that, a possible refCounting of the list internal rep does too. pcmacdon added on 2005-02-10 06:42:53: File Added - 119386: dif2 Logged In: YES user_id=190660 Attached is another attempt at a patch. This one actually works. It also falls back to getting the error message from listPtr if objv is no longer available, so even tracebacks work now. One remaining issue: an "execution trace" could potentially invalidate the objv, so listPtr should probably be passed into TclCheckExecutionTraces (etc) and similar checks done there. pcmacdon added on 2005-02-10 05:54:25: Logged In: YES user_id=190660 A refcount, just for Tcl_EvalObjEx()? That's just crazy enough, it just might work! So if Tcl_EvalObjEx() sees a non-zero refcount on a objPtr, it could fall back to calling Tcl_EvalEx(). dkf added on 2005-02-10 05:35:35: Logged In: YES user_id=79902 The analysis makes it sound very much like some of the problems I faced when doing the [dict for] implementation. I fixed them by refcounting the internal rep of the object separate to the object itself and such a fix sounds viable to me here too; the List structure is entirely internal to Tcl after all... pcmacdon added on 2005-02-10 05:34:04: File Added - 119376: dif pcmacdon added on 2005-02-10 05:34:03: Logged In: YES user_id=190660 2nd try on dif. The TclStackAlloc approach is more elegant, though. msofer added on 2005-02-10 05:08:52: Logged In: YES user_id=148712 Peter: your patch is missing ... pcmacdon added on 2005-02-10 04:49:01: Logged In: YES user_id=190660 Find attached a patch that fixes the segfault problems. This executes both exploit scripts without segfaults, but as is, I think the error traceback is sacrified. msofer added on 2005-02-10 04:44:47: File Added - 119358: 1119369 msofer added on 2005-02-10 04:44:46: Logged In: YES user_id=148712 Patch against HEAD attached. Note that a backport requires different memory management (TclStackAlloc not available in 8.4) msofer added on 2005-02-10 03:48:58: Logged In: YES user_id=148712 Not the refcount of objPtr - the refcount of listRepPtr->elements[i]. But you're right: this is necessary, but not enough ... Still thinking of a solution that preserves the objv array throughout the TEOV call. At worst, we could copy it; or add some sort of refCount to the list internal rep. pcmacdon added on 2005-02-10 03:48:42: Logged In: YES user_id=190660 I've verified the proposed patch is insufficient. It is still exposed to execution leave traces accessing an invalid objv. Below is another exploit script based on Tcl8.4 execution leave trace. I interpret this to mean that TclEvalObjvInternal also needs the objPtr arg. Moreover, the check probably needs to be expanded to something like: (((List *) objPtr->internalRep.twoPtrValue.ptr1)->elements == objv) && /* same check for objc here. */ Ironically, objv could theoretically be freed and reallocatted to the same block and size (ie. back to a list again) but although not correct, the execution path would at least be safeguarded from segfaulting. ###### SCRIPT BEGIN set SRC [list Foo 1] set N 0 proc ::Foo str { puts "FOO: $str" if {[incr ::N]<3} { namespace eval :: $::SRC } else { error "BAD CALL" } } proc ::Trc args { puts "Trc: $args" } trace add execution Foo leave Trc uplevel #0 $SRC ###### SCRIPT END pcmacdon added on 2005-02-10 03:24:26: Logged In: YES user_id=190660 I don't think you can fix this by incrementing the refcount. In the debugger I see the refcount for objPtr is already 5. The problem is rather the representation of objPtr->objv, ie. ((List *) objPtr->internalRep.twoPtrValue.ptr1)->elements gets freed. The objPtr and objv[0] et al, are all ok, but the array objv is freed. But Tcl_EvalObjv can have active references to objv. The assumption seems to be that multiplie users of objPtr will all be happy to use the bytecode representation from now on. msofer added on 2005-02-10 02:19:17: Logged In: YES user_id=148712 IIUC, what is happening is: (1) Tcl_EvalObjEx (TEOE) is called with TCL_EVAL_DIRECT and a pure list objPtr (2) TEOE takes the pure-list branch and calls Tcl_EvalObjv (TEOV) with objv pointing to the elements of the list rep of objPtr (3) objPtr shimmers away from its listRep before TEOV returns an error (4) TEOE error handling tries to read the list elements -> crash Note that his could well happen even before the actual command is called by TEOVI - if the shimmering is caused by a trace. At first sight, the clean solution is to let Tcl_EvalObjEx (pure-list branch) do an incrRefCount on the list elements before calling TEOV. We do assume that the objv pointers remain valid throughout a call. pcmacdon added on 2005-02-10 01:40:19: Logged In: YES user_id=190660 The above patch may not go quite far enough. In TclEvalObjvInternal() for the code may try to process execution traces with a defunct objv[]. That means that TclEvalObjvInternal() maybe should add and do a similar check on the objPtr argument as well. msofer added on 2005-02-09 23:09:10: Logged In: YES user_id=148712 I verified that the test script segfaults in linux (8.4.6 and HEAD) |