Ticket UUID: | 686782 | |||
Title: | Changes to Tcl_SetObjLength() break "expect -re" | |||
Type: | Bug | Version: | obsolete: 8.4.1.1 | |
Submitter: | rmax | Created on: | 2003-02-14 21:09:40 | |
Subsystem: | 44. UTF-8 Strings | Assigned To: | das | |
Priority: | 8 | Severity: | ||
Status: | Closed | Last Modified: | 2003-02-19 23:58:01 | |
Resolution: | Fixed | Closed By: | das | |
Closed on: | 2003-02-19 16:58:01 | |||
Description: |
The following expect script shows unexpected (sic) behaviour when run on Tcl as of 2003/01/17 or later: spawn /bin/echo "test\nfoo\nbar\nbla" while 1 { expect { -re "\n" { lappend lines $expect_out(buffer) } eof break } } foreach line $lines { puts "»$line«" } It should print: --------------- »test « »foo « »bar « »bla « --------------- but it does print something like: --------------- »test « »foo b« »ar bl« --------------- On some machines it fails almost every time, on others only once in a while. If expect is being run on a i386 under valgrind, the misbehaviour occurs every time. If Tcl_SetObjLength() get's reverted to r1.26 the problem disappears. | |||
User Comments: |
das added on 2003-02-19 23:58:01:
Logged In: YES user_id=90580 checked in attached tclStringObj.diff das added on 2003-02-18 18:52:03: Logged In: YES user_id=90580 also c.f. bug 635200 das added on 2003-02-18 18:44:26: Logged In: YES user_id=90580 ack, of course 8.4.2 is what I meant, sorry... rmax added on 2003-02-18 18:35:15: Logged In: YES user_id=124643 Thanks, Daniel. Your patch fixes it without needing further changes to expect or the Tcl core. I think it should even still go into 8.4.2. das added on 2003-02-18 17:50:26: Logged In: YES user_id=90580 vince, thanks for attaching the patch, sorry for being slow. reinhardt, does this fix the problem you see? I don't have any problem with my expect -re scripts anymore with this without having to change expect. (the patch restores the abovementioned Tcl_SetObjLength side effect) if yes, this should definitely be applied before 8.4.3 for efficiency, we might want to consider new API in the future to invalidate the unicode rep explicitly without freeing its associated memory (to avoid deallocating and reallocating the unicode buffer repeatedly, thus defeating the memory optimizations in tclStringObj.c) I've proposed Tcl_InvalidateUnicode() or/and Tcl_SetUnicodeLength() (where a length param of -1 would invalidate the unicode rep) vincentdarley added on 2003-02-18 17:40:19: Logged In: YES user_id=32170 As far as whether we should make this change now, or not, I say fix the problem now. Expect should really not be manipulating those bytes: "Tcl_GetStringFromObj and Tcl_GetString return an object's string representation..(snip)... The storage referenced by the returned byte pointer is owned by the object manager and should not be modified by the caller." quote from the man pages... vincentdarley added on 2003-02-18 17:33:18: Logged In: YES user_id=32170 You're right -- no new API needed. The stringUtf.diff can be used to isolate those core places (just 3 or 4) where my analysis shows a freeing of internal rep is required to avoid potential bugs/crashes. rmax added on 2003-02-18 17:26:26: Logged In: YES user_id=124643 There were several places in expect where it relied on the undocumented side effect of the old Tcl_SetObjLength() to remove the internal rep. For the time being I have added a expInvalidateInternalRep() function to our expect package that just calls the freeIntRepProc of the object type and then sets the typePtr to NULL, as suggested on yesterday's chat by Jeff and Kevin, and by dkf below, while I was typing these lines ;). But I imagine, that there are other extensions and applications which also rely on the old behaviour of Tcl_SetObjLength(). So it might be better to not change that before the next major release. Mabe Tcl_SetObjLength() could be split in two parts, a private one that keeps the internal rep and can be used by the core, and a public wrapper which maintains the current behaviour. dkf added on 2003-02-18 17:16:28: Logged In: YES user_id=79902 Removing the Unicode rep of an object is trivial. Just nuke the internal rep (free it and set the type to NULL.) vincentdarley added on 2003-02-18 17:16:14: File Added - 42758: tclStringObj1.diff Logged In: YES user_id=32170 Adding Daniel's patch. vincentdarley added on 2003-02-18 17:04:16: File Added - 42755: stringUtf.diff Logged In: YES user_id=32170 You've hit the nail on the head there. objPtr->bytes should be considered read-only. What we need is a new API which removes the unicode rep entirely. There are actually a few places in Tcl's core which do this too. I've attached a patch which fixes the core areas. Daniel's patch should also be applied. However, it looks as if expect.c needs fixing too. rmax added on 2003-02-18 04:34:23: Logged In: YES user_id=124643 I think I gou it: expect.c around line 2220 modifies the contents of the pointer that comes back from Tcl_GetStringFromObj(): ---- snip ---- str = Tcl_GetStringFromObj(esPtr->buffer, &length); ... if (length != 0) { memmove(str,str+match,length-match); } Tcl_SetObjLength(esPtr->buffer, length-match); --- snap ---- This modifies the string rep, but doesn't update the unicode string. rmax added on 2003-02-18 02:04:27: File Added - 42689: test2 rmax added on 2003-02-18 02:04:26: Logged In: YES user_id=124643 Further investigation has shown, that at a point where expect's match buffer starts growing rapidly, it's string and unicode rep get out of sync. This means, the string rep changes, but the unicode rep, that is used for regexp matching stays the same. I hve instrumented Tcl_RegExpExecObj() to show the difference: Tcl_DString ds; ... Tcl_DStringInit(&ds); Tcl_UniCharToUtfDString(udata, length, &ds); fprintf(stderr, ">> string: %s\n", Tcl_GetString(objPtr)); fprintf(stderr, ">>unicode: %s\n", Tcl_DStringValue(&ds)); Tcl_DStringFree(&ds); Normally the two strings printed should always be the same, but after some time, the unicode rep seems to "freeze" while the string rep keeps walking trough the input with an offset according to the first match withing the unicode rep. Here is an example of another test script (attached) at the point where it happens: --- here it is still ok --- >> string: abc defg >>unicode: abc defg --- and here it happens... --- >> string: defg hijkl >>unicode: abc defg From there on, the unicode rep will remain the same while the string rep keeps changing. I suppose, that either expect doesn't correctly invalidate the unicode rep when it changes the string rep, or that Tcl_SetObjLength() does something wrong with the representations under certain conditions that are triggered by expect. nobody added on 2003-02-17 16:38:43: Logged In: NO Where can I find that patch? vincentdarley added on 2003-02-15 18:57:51: Logged In: YES user_id=32170 I imagine Daniel's patch fixes this problem? hobbs added on 2003-02-15 07:29:13: Logged In: YES user_id=72656 This appears to be a regression from 8.4.1 in trying to fix tclStringObj.c issues. |