Tcl Source Code

View Ticket
Login
Ticket UUID: 2629338
Title: segfault on traced vars in TclCallVarTraces
Type: Bug Version: obsolete: 8.5.6
Submitter: nobody Created on: 2009-02-23 04:40:29
Subsystem: 07. Variables Assigned To: dgp
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2014-10-09 14:14:42
Resolution: Fixed Closed By: dgp
    Closed on: 2014-10-09 14:14:42
Description:
I've had a bug preventing me from upgrading my product to 8.5 for some time, but I've never been able to narrow it down to a simple example. So, In lieu of that, here's a patch that fixes the bug.

The issue is commonly triggered by tdom, which makes extensive use of variable traces (well, one per dom object.)

The core issue is essentially that TclUnsetVarStruct calls all the unset traces on the variable, then sets about deleting the traces. The problem is, the traces may have already deleted themselves when they were called.  (which tdom's always do.) In particular, the 'head' of the list may have removed itself.

In that case, TclUnsetVarStruct starts iterating over the items to be freed while looking at already-freed memory. Since it was freed so recently, it gets away with this in the normal case, but on rare occasions, the memory has already been reallocated. Hilarity ensues.

I'm not sure if this counts as the same bug or not, but TclDeleteNamespaceVars neglects to update the nextTracePtr for the activeVarTracePtr on the interp, with similar results. I've never seen that bug trigger anywhere but windows, but it shows up fairly often there in our product (on tcl 8.4.13 as well as 8.5.6.)

I also noticed that the VarTrace entries are freed with 'Tcl_EventuallyFree' without updating their next pointers. Since another call might have the current object preserved, but not the next one, the current object might stick around and continue being used while the next object has already been freed. So the patch includes a change to null out the next pointer when calling eventuallyfree. I'm not sure if there's a user-visible bug associated with that or not. I did a cursory check for similar antipatterns in other files for var traces (and found none) but didn't bother checking other link lists.

I will attach the patch.
User Comments: dgp added on 2010-10-04 22:55:56:
spoke too soon; bug 3062331 seems
to be there in all Tcl/Tk 8.5.* releases.

The commit of Patch 2629338 did nothing
to fix it, so I'm not seeing the evidence for
them being related.

dgp added on 2010-10-04 22:52:54:
so it appears that committing this patch
introduced the new bug?

pcmacdon added on 2010-09-09 09:32:09:
See  ID: 3062331 for possibly related bug.

dgp added on 2009-11-06 05:56:09:
To clarify a bit, the patch itself does not worry me.

What worries me is that the bug is still lurking in
there somewhere; this patch was merely a workaround
and not a true fix, and all we've done is make the
true bug even harder to discover.

dgp added on 2009-11-06 05:52:29:
I'm very uneasy about this patch, since I still
have no understanding about what bug it is
fixing, nor any test case to confirm that it actually
fixes anything.

The original description says "The core issue is essentially that TclUnsetVarStruct calls all the unset
traces on the variable, then sets about deleting the traces. The problem
is, the traces may have already deleted themselves when they were called.
(which tdom's always do.) In particular, the 'head' of the list may have
removed itself."

However, UnsetVarStruct moves all trace operations
over to a dummy copy that ought to be out of reach.
tdom does make calls to Tcl_UntraceVar(), true, but
those are no-ops, since they lookup the original
traced variable, which no longer has any traces
to remove since they've all been moved over to
the dummy.

dgp added on 2009-10-20 04:22:58:
 keeping open until an actual test is in place.

sorry, misread the patch.  one line in tclTrace.c; many in tclVar.c.

dgp added on 2009-10-20 04:19:38:
Only one line got committed on core-8-5-branch?

dkf added on 2009-10-18 05:36:50:
Looking at the patch (as converted by dgp) applied to HEAD (with obvious offsets) it all looks good to me, so I've applied it (and also to the 8.5 branch). The fact that it's had substantive in-service testing makes me hopeful...

My only concern is the lack of an effective test for this (and hence whether the test is minimal) but occasionally that's what one has to put up with. Leaving this in pending in case anyone thinks of a way to test. Priority lowered.

kraney added on 2009-10-02 03:46:45:
I can add at this point that we've been using TCL with this patch in our product for nearly a year now, with no recurrences of the original crash, and with no known regressions caused by the patch.

dgp added on 2009-04-10 04:19:07:
tclVar.c got a major major rewrite for 8.5.

I won't be too surprised if that introduced
some bugs, but will make only slow progress
without the help of the rewriter.

The surprising claim is that at least later
8.4.* releases suffer the same bug.  I would
have expected those releases to be quite
stable, unless we can point to some destabilizing
commit.  In particular the general problem
that running trace handler routines can change
the set of traces, I believed to have settled on
a robust solution with the activeTracePtr lists.

Anyhow, I'm not giving up on this, but it doesn't
look like a fix will get accepted in time for the
8.5.7 release.

kraney added on 2009-04-10 02:27:50:
That's correct, the bug was observed in 8.4.13 as well, though hitting it seemed mostly (but not entirely) confined to the windows port.

I believe the bug was not present in 8.4.9, but I wouldn't swear to it.

dgp added on 2009-04-10 02:19:27:
Do I understand this bug is observed 
in release Tcl 8.4.13 as well as Tcl 8.5.6 ?

If so, has this bug always been present?

If not, is there any guess as to when it
first showed up?

dgp added on 2009-04-10 01:23:21:
sadly there is very little that is "self-evident"
about the trace implementation.

kraney added on 2009-03-21 00:55:49:
No, as I mentioned in the original post, I've never been able to get a simple example to trigger the bug. My hypothesis is one's memory usage has to get interesting before it will trigger. That's why I fixed it myself before entering a ticket. Hopefully, the necessity of the changes I've made are self-evident, though naturally I'm sure you'd like to have a test case to try it with. I wish I could oblige.

The bug triggers from namespace delete, from itcl::delete, and when a tdom DOM object created using the form "tdom createDocument foo var" goes out of scope. Not on every call to any of those; just on rare occasions. It occurs most frequently on windows, when using a tclkit based shell. It occurs with threads enabled, and without threads as well.

dgp added on 2009-03-20 23:43:07:

File Added - 318776: 2629338.patch


Here's the patch in more customary
format, derived from core-8-5-branch.

Does the original reporter have a short
script to demo the bug?

File Added: 2629338.patch

dkf added on 2009-02-23 17:45:02:
Good thing you attached the patch in your initial submission; SF only allows the submitter (or a project dev) to attach files after submission. (It's an anti-maliciousness measure.)

kraney added on 2009-02-23 11:43:10:
Oops, forgot to log in. That was me.

nobody added on 2009-02-23 11:40:31:

File Added - 314724: tcl.diff

Attachments: