Tcl Source Code

View Ticket
Login
Ticket UUID: 686782
Title: Changes to Tcl_SetObjLength() break "expect -re"
Type: Bug Version: obsolete: 8.4.1.1
Submitter: rmax Created on: 2003-02-14 21:09:40
Subsystem: 44. UTF-8 Strings Assigned To: das
Priority: 8 Severity:
Status: Closed Last Modified: 2003-02-19 23:58:01
Resolution: Fixed Closed By: das
    Closed on: 2003-02-19 16:58:01
Description:
The following expect script shows unexpected (sic)
behaviour when run on Tcl as of 2003/01/17 or later:

 spawn /bin/echo "test\nfoo\nbar\nbla"
 while 1 {
    expect {
        -re "\n" { lappend lines $expect_out(buffer) }
        eof break
    }
 }
 foreach line $lines { puts "»$line«" }

It should print:
---------------
»test
«
»foo
«
»bar
«
»bla
«
---------------
but it does print something like:
---------------
»test
«
»foo
b«
»ar
bl«
---------------

On some machines it fails almost every time, on others
only once in a while. If expect is being run on a i386
under valgrind, the misbehaviour occurs every time. If
Tcl_SetObjLength() get's reverted to r1.26 the problem
disappears.
User Comments: das added on 2003-02-19 23:58:01:
Logged In: YES 
user_id=90580

checked in attached tclStringObj.diff

das added on 2003-02-18 18:52:03:
Logged In: YES 
user_id=90580

also c.f. bug 635200

das added on 2003-02-18 18:44:26:
Logged In: YES 
user_id=90580

ack, of course 8.4.2 is what I meant, sorry...

rmax added on 2003-02-18 18:35:15:
Logged In: YES 
user_id=124643

Thanks, Daniel.

Your patch fixes it without needing further changes to
expect or the Tcl core. I think it should even still go into
8.4.2.

das added on 2003-02-18 17:50:26:
Logged In: YES 
user_id=90580

vince, thanks for attaching the patch, sorry for being slow.

reinhardt, does this fix the problem you see? I don't have any 
problem with my expect -re scripts anymore with this without 
having to change expect. (the patch restores the 
abovementioned Tcl_SetObjLength side effect)

if yes, this should definitely be applied before 8.4.3

for efficiency, we might want to consider new API in the future to 
invalidate the unicode rep explicitly without freeing its associated 
memory (to avoid deallocating and reallocating the unicode 
buffer repeatedly, thus defeating the memory optimizations in 
tclStringObj.c)

I've proposed Tcl_InvalidateUnicode() or/and 
Tcl_SetUnicodeLength() (where a length param of -1 would 
invalidate the unicode rep)

vincentdarley added on 2003-02-18 17:40:19:
Logged In: YES 
user_id=32170

As far as whether we should make this change now, or 
not, I say fix the problem now.  Expect should really not 
be manipulating those bytes:

"Tcl_GetStringFromObj and Tcl_GetString return an 
object's string representation..(snip)...  The storage 
referenced by the returned byte pointer is owned by the 
object manager and should not be modified by the 
caller."

quote from the man pages...

vincentdarley added on 2003-02-18 17:33:18:
Logged In: YES 
user_id=32170

You're right -- no new API needed.  The stringUtf.diff 
can be used to isolate those core places (just 3 or 4) 
where my analysis shows a freeing of internal rep is 
required to avoid potential bugs/crashes.

rmax added on 2003-02-18 17:26:26:
Logged In: YES 
user_id=124643

There were several places in expect where it relied on the
undocumented side effect of the old Tcl_SetObjLength() to
remove the internal rep. For the time being I have added a
expInvalidateInternalRep() function to our expect package
that just calls the freeIntRepProc of the object type and
then sets the typePtr to NULL, as suggested on yesterday's
chat by Jeff and Kevin, and by dkf below, while I was typing
these lines ;).

But I imagine, that there are other extensions and
applications which also rely on the old behaviour of
Tcl_SetObjLength(). So it might be better to not change that
before the next major release.

Mabe Tcl_SetObjLength() could be split in two parts, a
private one that keeps the internal rep and can be used by
the core, and a public wrapper which maintains the current
behaviour.

dkf added on 2003-02-18 17:16:28:
Logged In: YES 
user_id=79902

Removing the Unicode rep of an object is trivial.  Just nuke
the internal rep (free it and set the type to NULL.)

vincentdarley added on 2003-02-18 17:16:14:

File Added - 42758: tclStringObj1.diff

Logged In: YES 
user_id=32170

Adding Daniel's patch.

vincentdarley added on 2003-02-18 17:04:16:

File Added - 42755: stringUtf.diff

Logged In: YES 
user_id=32170

You've hit the nail on the head there.  objPtr->bytes 
should be considered read-only.  What we need is a new 
API which removes the unicode rep entirely.  There are 
actually a few places in Tcl's core which do this too.

I've attached a patch which fixes the core areas.  
Daniel's patch should also be applied.  However, it 
looks as if expect.c needs fixing too.

rmax added on 2003-02-18 04:34:23:
Logged In: YES 
user_id=124643

I think I gou it:
expect.c around line 2220 modifies the contents of the
pointer that comes back from Tcl_GetStringFromObj():

---- snip ----
str = Tcl_GetStringFromObj(esPtr->buffer, &length);
...
if (length != 0) {
    memmove(str,str+match,length-match);
}
Tcl_SetObjLength(esPtr->buffer, length-match);
--- snap ----

This modifies the string rep, but doesn't update the unicode
string.

rmax added on 2003-02-18 02:04:27:

File Added - 42689: test2

rmax added on 2003-02-18 02:04:26:
Logged In: YES 
user_id=124643

Further investigation has shown, that at a point where
expect's match buffer starts growing rapidly, it's string
and unicode rep get out of sync. This means, the string rep
changes, but the unicode rep, that is used for regexp
matching stays the same. I hve instrumented
Tcl_RegExpExecObj() to show the difference:

    Tcl_DString ds;
...
    Tcl_DStringInit(&ds);
    Tcl_UniCharToUtfDString(udata, length, &ds);
    fprintf(stderr, ">> string: %s\n", Tcl_GetString(objPtr));
    fprintf(stderr, ">>unicode: %s\n", Tcl_DStringValue(&ds));
    Tcl_DStringFree(&ds);

Normally the two strings printed should always be the same,
but after some time, the unicode rep seems to "freeze" while
the string rep keeps walking trough the input with an offset
according to the first match withing the unicode rep. Here
is an example of another test script (attached) at the point
where it happens:

--- here it is still ok ---
>> string: abc
defg
>>unicode: abc
defg

--- and here it happens... ---
>> string: defg
hijkl
>>unicode: abc
defg

From there on, the unicode rep will remain the same while
the string rep keeps changing.

I suppose, that either expect doesn't correctly invalidate
the unicode rep when it changes the string rep, or that
Tcl_SetObjLength() does something wrong with the
representations under certain conditions that are triggered
by expect.

nobody added on 2003-02-17 16:38:43:
Logged In: NO 

Where can I find that patch?

vincentdarley added on 2003-02-15 18:57:51:
Logged In: YES 
user_id=32170

I imagine Daniel's patch fixes this problem?

hobbs added on 2003-02-15 07:29:13:
Logged In: YES 
user_id=72656

This appears to be a regression from 8.4.1 in trying to fix 
tclStringObj.c issues.

Attachments: