Tcl Source Code

View Ticket
Login
2016-11-26
01:57
Fold create argument to ArrayVar() into TCL_VAR_CREATE flags bit which only ArrayVar() understands. ... check-in: 2590837335 user: andy tags: amg-array-enum-c-api
2010-02-03
20:42 Closed ticket [2939073fff]: Unset PATTERN unsafe with trace-originated secondary unsets plus 6 other changes artifact: 8b33caf852 user: ferrieux
20:29 Ticket [2939073fff]: 4 changes artifact: 7bcc34831d user: dkf
20:26 Ticket [2939073fff]: 4 changes artifact: 4d30a1fa8e user: dkf
17:02 Ticket [2939073fff]: 4 changes artifact: af222e3559 user: ferrieux
16:46 Ticket [2939073fff]: 4 changes artifact: c9f541ebb8 user: dkf
00:34 Ticket [2939073fff]: 4 changes artifact: e672a02e67 user: ferrieux
00:32 Add attachment arrayunset3.patch to ticket [2939073fff] artifact: 0173c8ae1c user: ferrieux
00:32 Ticket [2939073fff] Unset PATTERN unsafe with trace-originated secondary unsets status still Open with 4 other changes artifact: f50d5c7590 user: ferrieux
00:31 Ticket [2939073fff]: 4 changes artifact: 3e8176e7db user: ferrieux
00:26 Ticket [2939073fff]: 4 changes artifact: c92b4e5e8b user: ferrieux
2010-02-02
23:47 Open ticket [2939073fff]. artifact: 8cd012f72b user: ferrieux
23:47 Ticket [2939073fff]: 4 changes artifact: f5c03d2762 user: ferrieux
23:14 Closed ticket [2939073fff]. artifact: ebd2bde68a user: dkf
23:09 Add attachment arrayunset2.patch to ticket [2939073fff] artifact: 1cbe8a6985 user: ferrieux
23:09 Ticket [2939073fff] Unset PATTERN unsafe with trace-originated secondary unsets status still Open with 4 other changes artifact: fb155ee3c3 user: ferrieux
23:08 Ticket [2939073fff]: 4 changes artifact: 2ba4ec6f58 user: ferrieux
20:13 Ticket [2939073fff]: 4 changes artifact: f4742ea70f user: ferrieux
20:12 Ticket [2939073fff]: 4 changes artifact: c85b552dcf user: ferrieux
20:00 Open ticket [2939073fff]. artifact: f2c9debfc0 user: ferrieux
07:44 Closed ticket [2939073fff]. artifact: a7f2194727 user: dkf
07:09 Ticket [2939073fff]: 4 changes artifact: 0e96552d1f user: dkf
07:09 Add attachment arrayunset.patch to ticket [2939073fff] artifact: c4e9d43973 user: dkf
07:08 Ticket [2939073fff] Unset PATTERN unsafe with trace-originated secondary unsets status still Open with 4 other changes artifact: 54fb52c896 user: dkf
07:08 Ticket [2939073fff]: 4 changes artifact: da3979b6db user: dkf
2010-02-01
20:16 Ticket [2939073fff]: 1 change artifact: bbb381d9c9 user: ferrieux
20:16 Ticket [2939073fff]: 4 changes artifact: a76f140183 user: ferrieux
20:11 Ticket [2939073fff]: 4 changes artifact: 6525c85dc5 user: ferrieux
09:06 Ticket [2939073fff]: 5 changes artifact: 826961a4ec user: ferrieux
08:35 Ticket [2939073fff]: 1 change artifact: 711e66990e user: ferrieux
08:33 Ticket [2939073fff]: 4 changes artifact: 61b15ff00c user: ferrieux
06:27 Ticket [2939073fff]: 4 changes artifact: c11ea49847 user: ferrieux
2010-01-31
02:16 Ticket [2939073fff]: 4 changes artifact: 9ea1fd84c5 user: ferrieux
01:34 Ticket [2939073fff]: 4 changes artifact: b922d95c0f user: ferrieux
00:52 Ticket [2939073fff]: 4 changes artifact: bb96bfd468 user: ferrieux
2010-01-30
19:53 Ticket [2939073fff]: 4 changes artifact: a5b01f5dcb user: ferrieux
2010-01-26
07:53 Ticket [2939073fff]: 4 changes artifact: a55261adad user: ferrieux
07:51 Add attachment crash1.txt to ticket [2939073fff] artifact: cc579eeda7 user: ferrieux
07:51 Ticket [2939073fff] IPC via sockets and traces causes segfault status still Open with 4 other changes artifact: fc408ad433 user: ferrieux
06:50 Ticket [2939073fff]: 4 changes artifact: bc3c53c272 user: stevel
2010-01-25
15:36 Ticket [2939073fff]: 4 changes artifact: 3541e52cde user: ferrieux
11:54 Add attachment echoevent.kit to ticket [2939073fff] artifact: f9065d7ff5 user: blacksqr
11:54 Ticket [2939073fff] IPC via sockets and traces causes segfault status still Open with 4 other changes artifact: f52dc4f56e user: blacksqr
04:54 New ticket [2939073fff]. artifact: f3c7fa973f user: blacksqr

Ticket UUID: 2939073
Title: Unset PATTERN unsafe with trace-originated secondary unsets
Type: Bug Version: obsolete: 8.5.8
Submitter: blacksqr Created on: 2010-01-25 04:54:03
Subsystem: 07. Variables Assigned To: dkf
Priority: 8 Severity:
Status: Closed Last Modified: 2010-02-03 20:42:43
Resolution: Fixed Closed By: ferrieux
    Closed on: 2010-02-03 13:42:43
Description:
I wrote a simple application to gather Tk events and pass them from one interpreter to another via tequila (http://wiki.tcl.tk/1243), which uses traces and socket communication to keep array values in sync between Tcl interpreters.

My application displays a canvas, to which a <Motion> event is bound.  Thus as the cursor is moved over the canvas a steady stream of Tk events is generated; the scripts associated with these events are captured and written to an array, which tequila echoes to an interpreter in a separate process.  

When I start this application and move the cursor continually over the canvas, sometime between five and sixty seconds later one of the interpreters segfaults and crashes.

Platform is Ubuntu Hardy, with ActiveTcl 8.5.5 installed.  I duplicated the bug using Pat Thoyt's tclkits versions 8.5.8 (http://www.patthoyts.tk/tclkit/linux-ix86/8.5.8/tclkit%2d8.5.8.gz) and 8.6b1.1 (http://www.patthoyts.tk/tclkit/linux-ix86/8.6%2dbeta/tclkit%2d8.6b1.1.gz).

For ease of duplication I created a starkit containing all relevant code called echoevent.kit (attached).  To duplicate:

1. in a terminal console, execute: tclkit echoevent.kit server
This will start a tequila server.

2. in a separate terminal console, execute: tclkit echoevent.kit master
This will start a simple Tk "master" application that displays a canvas.  Moving the cursor across this "master" canvas will cause the command associated with the <Motion> event to echo repeatedly to the terminal console.

3. in a third terminal console, execute: tclkit echoevent.kit slave
This will cause an identical "slave" Tk application to appear, except event processing for the "slave" canvas is disabled.  Moving the cursor across the "master" canvas will cause the event command to echo to both the "master" console and the "slave" console.  The tequila server is passing event information from the master interpreter to the slave interpreter.

4. move the cursor continually over the master canvas to generate a constant stream of <Motion> events to be echoed to the consoles.  Either the master or the slave process will crash with a segmentation fault within 60 seconds.
            
report cross-posted at: http://bugs.activestate.com/show_bug.cgi?id=85927
User Comments: ferrieux added on 2010-02-03 20:42:43:

allow_comments - 1

At last !!!
Thanks for fixing that "--" typo.

dkf added on 2010-02-03 20:29:12:
Applied the corrected version of your last patch.

dkf added on 2010-02-03 20:26:16:
We can't detect a trace in O(1) as that information is not collected (it can't be collected because we can't go cheaply from the state of knowing that there is a trace to knowing that there isn't one) and just because the semantics have been screwed up for a long time doesn't mean that they should not be fixed.

Your patch is *still* wrong though, independent of that. Really. (Look at the line where you're dealing with the reference count of protectedVarPtr...)

ferrieux added on 2010-02-03 17:02:09:
Donal, you're re-discovering hot water ;-)

The fact that [array unset *] + [trace add ... set ar($newkey) 1] leaves the array with an apparently random subset has been true since Day One of trace !!!

To try to keep the discussion on tracks, two things :

 (1) please admit that HEAD currently has a leak that my latest patch fixes, at iso-semantics with Day One.

 (2) if we are to iron out the semantics at the cost of *** POTENTIAL INCOMPATIBILITY ***, indeed I've discussed it with Miguel when evaluating "thermonuclear solutions". I was suggesting the snapshot idea, and Miguel proposed to just rewind the iterator. (I still prefer the snapshot, because the rewind leads to quadratic behavior and may fire some unset traces more than once). But the snapshot also has a performance impact wrt current code, and would need to kick in only when unset traces are present (can we detect a single-key trace in O(1) ?).

dkf added on 2010-02-03 16:46:16:
Your patch is still wrong. ;-)

More importantly, the whole semantics for how you're going about it is wrong too, and no single-iteration strategy can fix it. To show what I mean, here's a sample session (using the HEAD, but apply any fixes you want...)

% trace add variable foo unset {apply {{a b c} {if {$b eq "b"} {global foo; for {set i 0} {$i<100} {incr i} {set foo($i) 1}}}}}
% array set foo {b 1 f 1 j 1}
% parray foo
foo(b) = 1
foo(f) = 1
foo(j) = 1
% array unset foo *
% parray foo
foo(35) = 1
foo(36) = 1
foo(37) = 1

The fundamental issue is that when an unset trace tinkers with the contents of the array, [array unset]'s semantics go haywire. The only thing I can think of to fix this is to use a multi-step deletion process which first calculates which elements to kill and then nukes them while calling traces. (The confusion about what happens when a trace updates an element that is about to be deleted vs one that has already been deleted is probably an indication that we'll just have to be careful.) I suggest snapshotting (into a C array) at the start is sanest, and lets us avoid problems with the iterator's internal pointer entirely.

ferrieux added on 2010-02-03 00:34:27:
Attaching the fix. Several things to note:
  - removed the refcount-- == 1 logic, since it is wrong in that case, and CleanupVar does it too, and does it right
 - moved CleanupVar back to the bottom of the loop in the "else" of !IsVarUndefined
 - this fixes the leak, avoids using the var after freeing, and makes your comments accurate again ;-)

ferrieux added on 2010-02-03 00:32:01:

File Added - 361372: arrayunset3.patch

ferrieux added on 2010-02-03 00:31:28:

File Deleted - 361371:

ferrieux added on 2010-02-03 00:26:17:

File Added - 361371: arrayunset3.patch

ferrieux added on 2010-02-02 23:47:17:

allow_comments - 0

ferrieux added on 2010-02-02 23:47:16:
Donal, what you've just committed is not correct yet. The leak is still there.
I notice that your first CleanupVar is dangerous; if it succeeds, varPtr2 is now a freed structure, and yet is used in making decision.
Why not use the CleanupVar logic I posted ?

dkf added on 2010-02-02 23:14:04:

allow_comments - 1

ferrieux added on 2010-02-02 23:09:49:

File Added - 361360: arrayunset2.patch

ferrieux added on 2010-02-02 23:08:32:

File Deleted - 361330:

ferrieux added on 2010-02-02 20:13:57:
Attached the small delta-patch reintroducing CleanupVar and eliminating the leak.

ferrieux added on 2010-02-02 20:12:53:

File Added - 361330: arrayunset2.patch

ferrieux added on 2010-02-02 20:00:15:

allow_comments - 0

Beware, there is a leak in the non-error case now. This is due to only decrementing refcounts without ever calling CleanupVar. Not sure why you removed this call from my initial patch.

Proof of leak:

array set x {e 1 i 1}
trace add variable x unset rem
proc rem args {
    puts REM:$args
    if {![info exists ::already]} {
set ::already 1
unset ::x(e)
    }
}
array unset x i*
array stat x

=>
 number of buckets with 0 entries: 3
 number of buckets with 1 entries: 1

array size x
=> 0

What happens is that the secondary unset is delayed because "e" is in the protected state, but since the overall pattern "i*" doesn't include "e", it doesn't get directly deleted from the main iteration either.

Having noticed that VarHashRefCount(x)-- doesn't take the actual final step, we must call CleanupVar independently from the match outcome; this is exactly what my patch was doing.

dkf added on 2010-02-02 07:44:50:

allow_comments - 1

Backported to 8.5. I think it's probably not practical to backport to 8.4 since this is thoroughly dependent on miguel's VarReform work.

dkf added on 2010-02-02 07:09:39:

File Added - 361257: arrayunset.patch

dkf added on 2010-02-02 07:08:21:

File Deleted - 361174:

dkf added on 2010-02-02 07:08:02:
That patch doesn't quite work. It has excess refcount problems when there's a failure to unset midway. Evil case!

I'll attach the patch I've applied to the HEAD, which addresses the issue (it's closely based on yours and includes a test). Note that it doesn't include any code to invalidate unused memory; the --enable-symbols=mem configuration works fine for that already.

ferrieux added on 2010-02-01 20:16:54:
The attached patch fixes the issue by properly protecting the next-in-current-bucket.
This is done modularly (without touching tclHash.c) thanks to the property that the search object has a NULL nextEntryPtr at each bucket change. So the protection is done by upping the refcount of nextEntryPtr, and decreasing it again one loop cycle later. 

Since the patch includes the "sanitizing" line, a deterministic test case will be provided shortly.

Please review.

ferrieux added on 2010-02-01 20:11:32:

File Added - 361174: arrayunset.patch

ferrieux added on 2010-02-01 09:06:42:
Upping prio. Patch under way.

ferrieux added on 2010-02-01 08:33:40:
Ha ! Got it now.

In truth the secondary unset _does_ kill the hash entry in that case, it's just that the reuse of the freed block doesn't cause immediate harm normally. Hence I had to add the following 'sanitizing' line to FreeVarEntry:

         if (TclIsVarUndefined(varPtr) && !TclIsVarTraced(varPtr)
     && (VarHashRefCount(varPtr) == 1)) {
+varPtr->value.objPtr=(void *)0xdeadbeef;
 ckfree((char *) varPtr);

It is simply a matter of resetting a field to a visibly invalid value just before freeing.
The effect is that, as soon as somebody reuses that freed structure (and typically by indirections off this objPtr field), a SIGSEGV or SIGBUS will burst on 0xdeadbeef.

And it does: given this sanitizing patch, the following ultrashort example crashes tclsh:

array set x {e 1 i 1}
trace add variable x unset rem
proc rem args {
    puts REM:$args
    if {![info exists ::already]} {
set ::already 1
unset ::x(e)
    }
}
array unset x *

Explanation: a nascent array with a small size has only 4 buckets, allowing a quick search of collisions.
It turns out 'e' and 'i' fall in the same bucket. Since the entries of a same bucket are stored in a LIFO linked list, the [array unset x *] iteration will do 'i', then 'e'. The rest of the code arranges for the 'e' to be removed as a side effect of the unset trace on 'i', hence removing the rug. Boom.

Renaming the bug according to this confirmation.

ferrieux added on 2010-02-01 06:27:50:
Sigh.
 Just produced a bucket of depth three: [array set x  {a 1 e 1 i 1}].
(verify with [array stat x])

Then tried this:
     trace add variable x unset rem
     proc rem args {puts REM:$args; if {![info exists ::again]} {set ::again 1;unset ::x(e)}}

Then [array unset x *] should remove, in order, i,e,a.
*But*, since the trace proc removes e while removing i, the outer loop should get into trouble....

Though it doesn't for some reason.
Just noticed the tricky game of Undefining a var instead of coldly killing its hash entry. Maybe this explains that, and my hypothesis was naive. Something more subtle must come into play...

ferrieux added on 2010-01-31 02:16:35:
Additional insight: the way the "stateful iterator" Tcl_HashSearch work, is by a {bucket index, chained intra bucket pointer} pair. So the cases where the aforementioned interaction can cause harm, is when the second-level [unset] removes the very next entry from the same bucket.

That's why the crash is so rare. Now to designing a pin-sharp case ;-)

ferrieux added on 2010-01-31 01:34:30:
Still not managed to reproduce, but a good candidate from RTFS:
At line 3199 of tclVar.c, for [array unset var pattern], a loop iterates over key with a stateful iterator:

   for (varPtr2=VarHashFirstVar(varPtr->value.tablePtr, &search);
    varPtr2!=NULL ; varPtr2=VarHashNextVar(&search)) {

then, in the body of that loop, when a match is found, the unset is done, possibly calling traces, possibly wreaking the chaining of hash keys.

Some kind of locking, or a more robust iterator, must be used I think.

ferrieux added on 2010-01-31 00:52:07:
Homing in.
Just adding [puts] around the unset, shows a reentrancy issue.

UNSETCOLLAB>: .c,queue,19,* (size:23 -- matching:10)
UNSETCOLLAB>: .c,queue,20,* (size:22 -- matching:10)
UNSETCOLLAB<
UNSETCOLLAB>: .c,queue,19,* (size:12 -- matching:9)
UNSETCOLLAB<

This in turn can be traced down to Tequila doing [vwait] in its guts. With Tk and a heavy stream of events, that [vwait] can catch a GUI event and call arbitrary code, including some doing an extra [unset] on the same var.

Knowing this, I'll try to build an atomic example.

ferrieux added on 2010-01-30 19:53:08:
Further analysis of the core shows the command

   array unset collab .c,queue,42,*

being executed at the time. So it seems to hint at an interaction between patterned unsets and traces. I'd appreciate somebody knowing the app at hand to trim that stuff down to a manageable size... TBC

ferrieux added on 2010-01-26 07:53:07:
After Steve kindly brought all the pieces beside my bed, I could get the attached gdb stack trace under Linux (Fedora 12, 32bits). Nothing about Metakit there.

ferrieux added on 2010-01-26 07:51:49:

File Added - 360295: crash1.txt

stevel added on 2010-01-26 06:50:37:
I've reproduced the problem on OSX 10.6.2 using wish 8.5.7.  If I can figure how, I'll attach the traceback file

ferrieux added on 2010-01-25 15:36:43:
Since this happens on unix, can you enable core dumps, and provide a gdb stack trace of the crash ? I'd like to dichotomize between Metakit and the vanilla Tcl core.

As a side note, a .kit is not the easiest transport for some maintainers (me at least), because it requires additional effort to crack the hull open, and reproduce with the core-under-development. I hate efforts ;-)

blacksqr added on 2010-01-25 11:54:03:

File Added - 360172: echoevent.kit

Attachments: