Tcl Source Code

View Ticket
Login
Ticket UUID: cafef7c5856ec8b1cbc0ea13c6c41d96d9982d75
Title: Unexplained code in StringRplcCmd function (tclCmdMZ.c)
Type: Bug Version: 8.6.3
Submitter: FireTcl Created on: 2015-02-17 02:12:33
Subsystem: 18. Commands M-Z Assigned To: nobody
Priority: 3 Low Severity: Cosmetic
Status: Open Last Modified: 2015-05-16 06:15:51
Resolution: None Closed By: nobody
    Closed on:
Description:
In the implementation of StringRplcCmd (in the file tclCmdMZ.c) there is 2 sentences repeated 2 times (line 2350 aprox.):


    ustring = Tcl_GetUnicodeFromObj(objv[1], &length);
    length--;

The first occurrence of this group of two sentences should be deleted.
User Comments: dkf added on 2015-05-16 06:15:51:

To test the interpreted version, you need to do this:

set start end-2
set s string
$s replace $start $start end
You can use ::tcl::unsupported::disassemble script {$s replace $start $start end} to verify that this is the right thing to do.


FireTcl added on 2015-03-27 08:06:37:
I replaced the body of the function StringRplcCmd with only this code "return TCL_OK;" and the test
    set start end-2
    string replace $start $start end
continues working.

This means that the Tcl Interpreter doesn't call StringRplcCmd. I suppose that it calls "TclCompileStringReplaceCmd" instead.

anonymous added on 2015-03-27 06:22:48:
I deleted the second group of two sentences in the Tcl interpreter:
     ustring = Tcl_GetUnicodeFromObj(objv[1], &length);
     length--;

I compiled and the test proposed by dkf works:
    set start end-2
    string replace $start $start end

It returns "en". The same value returned by the interpreter without any modification to its code.

Also the condition:
      objv[1]== objv[2] || objv[1] == objv[3]
doesn't happens inside the function StringRplcCmd() in this test.

dkf added on 2015-03-17 11:47:17:

Do we test the case:

set start end-2
string replace $start $start end
If not, we should! (We also need to test the case with compilation disabled; string replace is one of the bytecoded commands in 8.6.4.)


FireTcl added on 2015-02-18 22:46:17:
Thanks for your very good explanation. I didn't imagine this situation.

What about changing the second group of two sentences for this other code:

if (objv[1]== objv[2] || objv[1] == objv[3]){
     ustring = Tcl_GetUnicodeFromObj(objv[1], &length);
     length--;
}

Perhaps this makes the code a little more efficient and clear.

Awaiting your opinion,
Miguel.

aku added on 2015-02-18 21:55:51:
First, Tcl_Obj are our structures to hold any value.
We are using a model where each value has two representations
(string, and internal), and at least one of them is valid at all times, sometimes both. The internal rep (intrep) is a cache, to help with performance.

Let us assume that 

   objv[1] == objv[2] == foo

when going into the function.

The Tcl_GetUnicodeFromObj() [TGUFO] will look at this foo and return a unicode intrep.
If the object had a different intrep, or none, this will be computed from the string rep.

Then we reach Tcl_GetIntGorIndex() [TGIGI]. While that nominally operates on objv[2] see our assumption above, that it is the same foo the TGUFO operated on.

The TGIGI will now throw away the unicode intrep of foo and compute the index intrep.

The "ustring" pointer now refers to invalid memory.
Trying to use it will crash the interpreter.

So the information has to be recomputed with a second TGUFO call.
If the assumption was not true then this second calls sees the unicode intrep generated by the first and simply returns it, no calculations done, therefore fast.

Is it possible to make the assumption true ?
Yes:
   string replace $start $start $end

I.e. use the index in $start and operate on the string for that same number.
While contrived it can happen, even if only through a mistake.
So we have to guard ourselves against the situation.

FireTcl added on 2015-02-18 21:21:09:
Perhaps I am doing a mistake but I don't understand why there is a possibility of corruption due to shimmering.

I have been reading the TclGetIntForIndexM macro in tclInt.h (line 2444 aprox.) and TclGetIntForIndex in tclUtil.c (line 3517 aprox.), and I don't see the relation.

dgp added on 2015-02-18 20:42:59:
See this:

http://core.tcl.tk/tcl/tktview/3608714

and also see the history of TIP 323
where handling of empty strings by
[string replace] was implemented, then
abandoned because too much existing code
depended on the old way.

Door is open to raise the specific topic
again in a fresh TIP where the compat issues
can be explicitly considered.

FireTcl added on 2015-02-18 20:37:02:
First of all, I am a newbie. I am learning Tcl.

It would be nice a command like this: "string insert".

Perhaps it's possible to change a little the implementation of  StringRplcCmd to allow this behaviour: inserting text without deleting nothing.

For example when last < first or when last = first-1, no deletion is produced.

The problem with this approach is that can break code, because when last < first, nothing happens: the provided string is returned.

Another choice is perhaps another function implementing this new command, similar to this one:



static int
StringInsCmd(
    ClientData dummy,        /* Not used. */
    Tcl_Interp *interp,        /* Current interpreter. */
    int objc,            /* Number of arguments. */
    Tcl_Obj *const objv[])    /* Argument objects. */
{
    Tcl_UniChar *ustring;
    int index, length;

    if (objc != 3 ) {
        Tcl_WrongNumArgs(interp, 1, objv, "string index insertion");
        return TCL_ERROR;
    }

    ustring = Tcl_GetUnicodeFromObj(objv[1], &length);
    length--;

    if (TclGetIntForIndexM(interp, objv[2], length, &index) != TCL_OK ){
        return TCL_ERROR;
    }

    Tcl_Obj *resultPtr;

    if (index < 0) {
        index = 0;
    } else if (index > length) {
        index = length
    }

    resultPtr = Tcl_NewUnicodeObj(ustring, index);
    
    Tcl_AppendObjToObj(resultPtr, objv[3]);
    
    Tcl_AppendUnicodeToObj(resultPtr, ustring + index, length - index);
    
    Tcl_SetObjResult(interp, resultPtr);

    return TCL_OK;
}

___________________________

It's necessary also to code tclCompileStringInsertCmd.

The problem with this other approach is code duplication, because the same effect could be achieved changing a little "string replace".

aku added on 2015-02-18 18:49:14:
Thanks for the explanation.

I have to admit, I do not see where it could shimmer in this block of code.

Ah! Promptly I manage to see, actually.

If the arguments 1 and (2 or 3) point to the same Tcl_Obj, then the intervening GetIntForIndexM() calls can shimmer the ustring away from us. Right.

Ok.

I agree, this should have test cases.
Possibly some comments in the code as well.
I agree with the tweaks as well.

Dropped severity.
Tweaked ticket description.

dgp added on 2015-02-18 18:18:50:
Both calls to Tcl_GetUnicodeFromObj() are necessary.

The first one gets the value for length.

The second gets a valid value for ustring without
any fear of corruption due to shimmering.

A few tweaks can be applied.  There's no need to
store the result of the first call in ustring,
and removing that, the whole ustring variable
can move into the nested scope where it is used.
I don't mind that change.  The one called for in
the original report will introduce bugs.

I also favor adding the tests to the test suite that
will demonstrate that fact.

aku added on 2015-02-18 17:36:45:

Don, see

http://core.tcl.tk/tcl/artifact/d6cf5f9c8c788ff2?ln=2350-2364

top and bottom of the marked block.

It looks like the second (in-block) is indeed superfluous. While 'ustring' is only used within the guarded block the 'length' information is needed before so it is the first use which is important.

Removal should not introduce any bugs as far as I can see.

The change could be classed as optimization.

Actual perf impact is likely low.


dgp added on 2015-02-18 13:14:08:
Are you just seeking to optimize?  Or are you seeing
invalid results from the routine?

dgp added on 2015-02-18 13:13:53:
Are you just seeking to optimize?  Or are you seeing
invalid results from the routine?