Tcl Source Code

View Ticket
Login
Ticket UUID: 1119369
Title: segfault in Tcl_EvalObjv (from Tcl_EvalObjEx call)
Type: Bug Version: obsolete: 8.4.9
Submitter: pcmacdon Created on: 2005-02-09 15:52:04
Subsystem: 10. Objects Assigned To: msofer
Priority: 4 Severity:
Status: Closed Last Modified: 2005-04-16 23:53:14
Resolution: Fixed Closed By: msofer
    Closed on: 2005-04-16 16:53:14
Description:
There is a problem with the internal handling of Tcl command 
objects that exposes it to a segmentation fault. 
The root of the problem seems to be that descendants of 
Tcl_EvalObjEx() makes an implicit assumption that the list 
elements of objPtr will not be freed.  But TclCompEvalObj()  
can and does free them to convert objPtr into bytecode.  
The following script should reproduce the problem 
(under Linux at least). 
 
######################################## 
# START OF FILE 
# Save as a file and execute "tclsh FILE" to see crash 
 if {![info exists SRC]} { 
  set SRC [list source $argv0] 
  uplevel #0 $SRC 
} else { 
  namespace eval x { 
    if {[info exists ::DONE]} { error "DUPLICATE SOURCE" } 
    set ::DONE 1 
    namespace eval :: $SRC 
  } 
} 
# END OF FILE 
##################################### 
 
 
I've found two specific changes that eliminate the segfault problem 
for me (although this is by no means a comprehensive solution). 
 
The first is in Tcl_FSEvalFile() where the following block is moved after 
the error processing (to just before the "end:" label). 
 
 
    if (iPtr->scriptFile != NULL) { 
        Tcl_DecrRefCount(iPtr->scriptFile); 
    } 
    iPtr->scriptFile = oldScriptFile; 
 
 
The second is a change to the Tcl_EvalObjv call in Tcl_EvalObjEx(). 
There are two possibilities here.  Option 1 is to just add 
TCL_EVAL_INVOKE to  
the flag in the call.  ie: 
 
 
   result = Tcl_EvalObjv(interp, listRepPtr->elemCount, 
         listRepPtr->elements, flags | TCL_EVAL_INVOKE); 
 
 
If that is undesirable, then option 2 is a new TclEvalObjvList() function 
which takes an additional argument (the listRepPtr) which can be 
checked 
to see if the representation is no longer a list, before error processing. 
Tcl_EvalObjEx() can use this new function. 
eg. 
 
 
static int 
TclEvalObjvList(interp, objc, objv, flags, objPtr) 
//  Old body of Tcl_EvalObjv here... 
    if ((code == TCL_ERROR) && !(flags & TCL_EVAL_INVOKE)) { 
 
 
        /*  
         * If there was an error, a command string will be needed for the  
         * error log: generate it now if it was not done previously. 
         */ 
 
 
        if (cmdLen == 0 && ((!objPtr) || (objPtr->typePtr == 
&tclListType))) { 
//... 
} 
 
 
int 
Tcl_EvalObjv(interp, objc, objv, flags) 
... 
{ 
   return TclEvalObjvList(interp, objc, objv, flags, NULL); 
}
User Comments: msofer added on 2005-04-16 23:53:14:
Logged In: YES 
user_id=148712

Refcounting the list intRep is now in head (patch 1158008)

msofer added on 2005-02-25 01:57:06:
Logged In: YES 
user_id=148712

Oops - I forgot to update this ticket. The bug is fixed, the
patch has been committed (and a backport to core-8-4-branch)
on 2004-02-10, including tests.

The ticket remains open as a reminder to explore the (IMO
better) solution of refcounting the List internal rep.

pcmacdon added on 2005-02-21 06:54:39:
Logged In: YES 
user_id=190660

One final observation.  Dumping the objc in all calls to Tcl_EvalObjv in a largish Tcl app,  
I note that < 5% have an objc==2 and so use the list optimization, and are therefore  
susceptible to the problem.   Autoload calls I think account for the bulk of these 
(leading to source IO which stalls the app anyways). 
  
The other 95% of calls are the more typical usage of eval, ie. objc>=3  
  
  eval $cmd $arg  
  
which are unoptimized and involve the overhead of Tcl_ConcatObj.

pcmacdon added on 2005-02-13 01:45:04:
Logged In: YES 
user_id=190660

What I'm proposing here covers the case in Tcl_EvalObjCmd
where objc>2  or objPtr=Tcl_Concat(...), etc. 

In this case, the internal representation can not change
because Tcl_EvalObjEx does not pass objPtr on to any other
function.
It merely grabs the objv from objPtr->twoPtrValue.ptr1
And the objPtr will be destroyed automatically by the
Tcl_DecrRefCount
before returning. ie.

Scenario: 
  On entering Tcl_EvalObjEx (flag=TCL_EVAL_DIRECT), 
objPtr->refCount == 0

Assumption:
  objPtr is a private object with no other users except the
caller.

msofer added on 2005-02-13 00:17:26:
Logged In: YES 
user_id=148712

No, that will not do. In principle, objPtr cannot become
invalid under the feet of Tcl_EvalObjEx - that is
Tcl_EvalObjEx's caller responsibility.

What we were witnessing here is that the Tcl_Obj *objPtr
loses its list internal rep: objPtr is still pointing to the
same Tcl_Obj, but that Tcl_Obj is not of tclListType
anymore. IOW, objPtr is ok but objPtr->twoPtrValue.ptr1 does
not point to a List struct anymore. The segfault is not from
an objPtr deref, it comes from the listRepPtr deref.

pcmacdon added on 2005-02-12 23:30:39:
Logged In: YES 
user_id=190660

The following is small optimization for the tclStackAlloc patch
to exclude valid calls.

if (objPtr->refCount<=1) {
     result = Tcl_EvalObjv(interp, listRepPtr->elemCount,
                    listRepPtr->elements, flags);
} else {
 /* Do tclStackAlloc of objv */
}

pcmacdon added on 2005-02-12 00:43:57:
Logged In: YES 
user_id=190660

One question that leaves me with is: Shouldn't namespace and
interp eval do the same thing?

If not, I would have thought that uplevel and eval would do
something like:

        objPtr = Tcl_DuplicateObj(objv[N]);
        result = Tcl_EvalObjEx(interp, objPtr, TCL_EVAL_DIRECT);

msofer added on 2005-02-11 23:40:43:
Logged In: YES 
user_id=148712

This is the "pure list optimisation": if a script is a "pure
list" (list internal rep and no string rep => guaranteed to
parse to the same list rep), by executing it directly you avoid:
 1. building a string rep of the script
 2. parsing the script into a sequence of commands (there's
only one!)
 3. parsing each command into words (we had them already);
note that each word would then be a fresh Tcl_Obj - ie, no
internal rep.
The pure list opt avoids these steps, and can also profit
from reusing cached internal reps.

% proc foo x {set x}
% set scr [list foo 1];  time {eval $scr} 100000
9 microseconds per iteration
% string length $scr; time {eval $scr} 100000
17 microseconds per iteration

pcmacdon added on 2005-02-11 23:24:54:
Logged In: YES 
user_id=190660

I had it backwards.    The problem triggers because of the form:

        Tcl_EvalObjEx(interp, objv[N], TCL_EVAL_DIRECT);

Which only Tcl_UplevelObjCmd and Tcl_EvalObjCmd are using.
In general, shouldn't the only valid use of TCL_EVAL_DIRECT be:

        objPtr = Tcl_ConcatObj(objc-N, objv+N);
        result = Tcl_EvalObjEx(interp, objPtr, TCL_EVAL_DIRECT);

ie.  a temporary object that will never be available to the
script user.  Why would eval and uplevel not want to shimmer
the object into bytecode, anyways?

msofer added on 2005-02-11 18:18:14:
Logged In: YES 
user_id=148712

The root of the problem is: TEOE is using (in the pure-list
branch) the list internal rep of objPtr as the objv argument
to TEOV; if objPtr happens to shimmer away to another
internal rep before TEOV returns, objv is freed. If TEOV (or
one of its callees!) tries to access objv after that, we get
a smash.

The correct fix has to be one of: do not use the pure-list
branch, or else insure that objv remains valid even if the
list shimmers away (as the patch does). 

Setting TCL_EVAL_DIRECT in the [ns eval] code just avoids
the shimmering in this particular case, but does not fix the
root problem. In particular, the version included in the new
test would still bomb:

    set SRC [list foo 1] ;# pure-list command 
    proc foo str {
# Shimmer pure-list to cmdName and error
proc $::SRC {} {}; $::SRC
error "BAD CALL"
    }
    eval $SRC

In you scenario:
1. the shimmering was caused by compilation in [namespace
eval ::], which you can indeed avoid by making it use
TCL_EVAL_DIRECT
2. the objv access was happening in error handling in TEOV
(in trace handlers, for your second example)

pcmacdon added on 2005-02-11 10:12:48:
Logged In: YES 
user_id=190660

Maybe there's a simpler solution.   As I mentioned,
namespace eval seems to be the trigger for the crash. 
Comparing Tcl_UplevelObjCmd and NamespaceEvalCmd, I note
that uplevel uses the TCL_EVAL_DIRECT flag in calls to
Tcl_EvalObjEx for all values of objc, but namespace eval
uses it only for objc>4.  Otherwise, the code is identical.

Changing namespace eval to use the TCL_EVAL_DIRECT flag
resolves my crash.   Is there a good reason for this seeming
inconsistency?  I've run a very large Tcl application with
it and
notice no appreciable difference in execution time, either way.
Anyways, good luck with it, whatever you decide.

BTW: This bug was particularly vexing because it was being
triggered, not by some exotic code, but rather by simply
using an autoloaded command where a recursive error occurs
in 8.4.+.

msofer added on 2005-02-10 08:21:24:
Logged In: YES 
user_id=148712

Yes, it is related. The solution implemented in the current
730244 fix is similar to the TclStackAlloc patch (with difft
mem management). IMO, the better fix for both  involves new
refcounts: the list intrep here, the Alias struct there (as
proposed by dkf).

dgp added on 2005-02-10 07:56:23:
Logged In: YES 
user_id=80530

Happy to see others are taking this on.

I've just skimmed the original
report, and I wonder if this
is related to Bug 730244.

pcmacdon added on 2005-02-10 07:12:47:
Logged In: YES 
user_id=190660

I understand that both the refcount and the TclStackAlloc
solutions have advantages.   But both I think
impose more overhead than the valid check approach.
Plus, I started implementing it before either of those
were suggested.  So I just finished it off.

msofer added on 2005-02-10 06:56:50:
Logged In: YES 
user_id=148712

Peter: you are trying to recover from something that should
not happen (objv becoming invalid). It is very difficult to
insure that it will not have other effects somewhere
different from what has already been identified.

I think it is preferable to prevent it from happening:
whoever passes an objv to a Tcl function (in this case, to
TOEV) should insure that it remains valid until its callee
returns. 

My patch does that, a possible refCounting of the list
internal rep does too.

pcmacdon added on 2005-02-10 06:42:53:

File Added - 119386: dif2

Logged In: YES 
user_id=190660

Attached is another attempt at a patch.
This one actually works.
It also falls back to getting the error message from listPtr
if objv is no longer available, so even tracebacks work now.

One remaining issue: an "execution trace" could potentially
invalidate
the objv, so listPtr should probably be passed into
TclCheckExecutionTraces (etc) and similar checks done there.

pcmacdon added on 2005-02-10 05:54:25:
Logged In: YES 
user_id=190660

A refcount, just for Tcl_EvalObjEx()?
That's just crazy enough, it just might work!
So if Tcl_EvalObjEx() sees a non-zero refcount on a objPtr,
it could fall back to calling Tcl_EvalEx().

dkf added on 2005-02-10 05:35:35:
Logged In: YES 
user_id=79902

The analysis makes it sound very much like some of the
problems I faced when doing the [dict for] implementation. I
fixed them by refcounting the internal rep of the object
separate to the object itself and such a fix sounds viable
to me here too; the List structure is entirely internal to
Tcl after all...

pcmacdon added on 2005-02-10 05:34:04:

File Added - 119376: dif

pcmacdon added on 2005-02-10 05:34:03:
Logged In: YES 
user_id=190660

2nd try on dif.

The TclStackAlloc approach is more elegant, though.

msofer added on 2005-02-10 05:08:52:
Logged In: YES 
user_id=148712

Peter: your patch is missing ...

pcmacdon added on 2005-02-10 04:49:01:
Logged In: YES 
user_id=190660

Find attached a patch that fixes the segfault problems.
This executes both exploit scripts without segfaults, but
as is, I think the error traceback is sacrified.

msofer added on 2005-02-10 04:44:47:

File Added - 119358: 1119369

msofer added on 2005-02-10 04:44:46:
Logged In: YES 
user_id=148712

Patch against HEAD attached. Note that a backport requires
different memory management (TclStackAlloc not available in 8.4)

msofer added on 2005-02-10 03:48:58:
Logged In: YES 
user_id=148712

Not the refcount of objPtr - the refcount of
listRepPtr->elements[i]. But you're right: this is
necessary, but not enough ...

Still thinking of a solution that preserves the objv array
throughout the TEOV call. At worst, we could copy it; or add
some sort of refCount to the list internal rep.

pcmacdon added on 2005-02-10 03:48:42:
Logged In: YES 
user_id=190660

I've verified the proposed patch is insufficient.  It is
still exposed to execution leave traces accessing an invalid
objv.
Below is another exploit script based on Tcl8.4 execution
leave trace.
I interpret this to mean that TclEvalObjvInternal also needs
the objPtr arg.
Moreover, the check probably needs to be expanded to
something like:

  (((List *) objPtr->internalRep.twoPtrValue.ptr1)->elements
== objv) &&
  /* same check for objc here.  */

Ironically, objv could theoretically be freed and
reallocatted to the same block and size (ie. back to a list
again) but although not correct, the
execution path would at least be safeguarded from segfaulting.




  
###### SCRIPT BEGIN

set SRC [list Foo 1]
set N 0
proc ::Foo str {
  puts "FOO: $str"
  if {[incr ::N]<3} {
    namespace eval :: $::SRC
  } else {
    error "BAD CALL"
  }
}
proc ::Trc args { puts "Trc: $args" }
trace add execution Foo leave Trc
uplevel #0 $SRC

###### SCRIPT END

pcmacdon added on 2005-02-10 03:24:26:
Logged In: YES 
user_id=190660

I don't think you can fix this by incrementing the refcount.
In the debugger I see the refcount for objPtr is already 5.
The problem is rather the representation of objPtr->objv, ie.

   ((List *) objPtr->internalRep.twoPtrValue.ptr1)->elements

gets freed.  The objPtr and objv[0] et al, are all ok,
but the array objv is freed.  But Tcl_EvalObjv can have active
references to objv.  The assumption seems to be that 
multiplie users of objPtr will all be happy to use the bytecode
representation from now on.

msofer added on 2005-02-10 02:19:17:
Logged In: YES 
user_id=148712

IIUC, what is happening is:

(1) Tcl_EvalObjEx (TEOE) is called with TCL_EVAL_DIRECT and
a pure list objPtr

(2) TEOE takes the pure-list branch and calls Tcl_EvalObjv
(TEOV) with objv pointing to the elements of the list rep of
objPtr

(3) objPtr shimmers away from its listRep before TEOV
returns an error

(4) TEOE error handling tries to read the list elements -> crash

Note that his could well happen even before the actual
command is called by TEOVI - if the shimmering is caused by
a trace. 

At first sight, the clean solution is to let Tcl_EvalObjEx
(pure-list branch) do an incrRefCount on the list elements
before calling TEOV. We do assume that the objv pointers
remain valid throughout a call.

pcmacdon added on 2005-02-10 01:40:19:
Logged In: YES 
user_id=190660

The above patch may not go quite far enough.
In TclEvalObjvInternal() for the code may try to process
execution traces with a defunct objv[].
That means that TclEvalObjvInternal() maybe should add and 
do a similar check on the objPtr argument as well.

msofer added on 2005-02-09 23:09:10:
Logged In: YES 
user_id=148712

 I verified that the test script segfaults in linux (8.4.6
and HEAD)

Attachments: