Tcl Source Code

View Ticket
Login
Ticket UUID: cd82cec7ce46a55af099b32b798398a78a505ef4
Title: TclX never updated to 8.6
Type: Bug Version: 8.6.1
Submitter: mi Created on: 2014-08-01 18:44:53
Subsystem: 46. Traces Assigned To: dgp
Priority: 5 Medium Severity: Severe
Status: Open Last Modified: 2014-08-04 06:26:03
Resolution: None Closed By: nobody
    Closed on:
Description:

The clientData argument supplied to Tcl_CreateTrace is being mis-interpreted by the Tcl-core. In particular, the PushProcCallFrame function in generic/tclProc.c treats it as a Proc-pointer, breaking horribly, when it is not.

I guess, not many extensions use this API, but TclX does -- and its profile.test seg-faults, when used with Tcl-8.6.

The bug is old -- a user reported TclX-crashing back in 2009, when it was dismissed (by me) as simply one more sign of 8.6 instability...

User Comments: dkf added on 2014-08-04 06:26:03:

FWIW, the last time I looked at the TclX code, back when implementing Tcl's version of lassign in 8.5, I saw that the TclX codebase then was desperately old-fashioned even then. For example, it was using Tcl's string-based API for all the list functions (and 8.5's lassign is enormously faster than TclX's version ever was).

My point is that TclX has needed major reconstruction work for quite a while even without the problems with profile; it needs an active maintainer. I've not the bandwidth to be that person. Have you?


mi added on 2014-08-03 05:49:47:

I suppose it's no worse than the defects that will be experienced by any extension using Tcl_(Get|Set)CmdInfo, but I think those losses are significant, and someone attached to TclX will have to judge which is worse to lose, a [profile] that will profile code in ways disinctly different from the way it normally runs, or a collection of new Tcl 8.6 features.

I fail to see the dilemma... The losses will only be incurred, if a developer uses the profile-command, right?

I aimed only at fixing the regression -- the loss of the profile-command, that upgrading to Tcl-8.6 would mean. That the command is unable to properly profile some of the new features of 8.6 is far less important than a regression.


dgp added on 2014-08-02 18:41:36:
I see the other patch.  That's another way to stop the crashing.

The keepers of TclX will have to determine how they want to
react to the fact that the assumptions of the TclX profile command
no longer match the implementation of Tcl.

The trouble with the patch is that I believe every profiled command
becomes NRE-disabled by virtue of being profiled.  This means
[yield] won't work within them.  This means a whole collection of
commands new in Tcl 8.6 are effectively disabled by TclX [profile]-ing.

I suppose it's no worse than the defects that will be experienced by
any extension using Tcl_(Get|Set)CmdInfo, but I think those losses
are significant, and someone attached to TclX will have to judge
which is worse to lose, a [profile] that will profile code in ways 
disinctly different from the way it normally runs, or a collection
of new Tcl 8.6 features.

msofer added on 2014-08-02 17:53:08:
According to that old user ticket http://sourceforge.net/p/tclx/bugs/54 the problem was already present with Tcl8.5.2 ... possibly even 8.5.0?  If so, it doesn't make much sense to consider updating it to 8.6 until it runs properly on 8.5 - too much to chew in just one meal.

msofer added on 2014-08-01 23:20:33:
"Why didn't you mention it on that old ticket back then?"

Because I couldn't be bothered to look into TclX sources back then, same as today. The diff is that now dgp did look and came up with that first diagnose: TclX is changing Command directly.

What dgp is hinting at now is a completely different kettle of fish: TclX assumes that Tcl evaluation is recursive. This doesn't require a fix, but rather a complete redesign. I could help with that.

As to "because it uses the structures and functions on which you are the top-expert", I fail to see the relevance. Are you saying that I should personally fix every bug incurred by anybody who is coding against Tcl internal headers? Without their assistance - just jump into their code? Life's too short ...

dgp added on 2014-08-01 22:47:29:
This will be disappointing to any happy users of
the [profile] command who wish to keep using it
as they update their Tcl, but it's truly necessary
until someone can rewrite the command to stop
making the fundamental assumption that Tcl
evaluation is a recursive operation.

Tcl 8.6 don't work that way no more.

Something like a profiler could be built up using
the execution tracing functionality of 
Tcl_TraceCommand() that was added in Tcl 8.4,
I believe.  This would allow a profiler to support
NRE without having to become NRE, if you follow.

 It would have to confront questions
about what it means to profile across a [yield]
though.  I'm signed up to stop crashes, but I'm
not interested in inventing that NRE-supporting
profiler.

dgp added on 2014-08-01 22:41:32:
I think the better fix is to re-implement the
[profile] command itself to notice when it's
in a Tcl 8.6+ interp and raise an error immediately.

"TclX profiler disabled in Tcl 8.6+"

dgp added on 2014-08-01 22:41:03:
I think the better fix is to re-implement the
[profile] command itself to notice when it's
in a Tcl 8.6+ interp and raise an error immediately.

"TclX profiler disabled in Tcl 8.6+"

dgp added on 2014-08-01 22:32:15:
Here's the first pass fix to stop the crashing:

Index: generic/tclXprofile.c
===================================================================
RCS file: /cvsroot/tclx/tclx/generic/tclXprofile.c,v
retrieving revision 1.4
diff -u -r1.4 tclXprofile.c
--- generic/tclXprofile.c       13 Oct 2009 19:28:23 -0000      1.4
+++ generic/tclXprofile.c       1 Aug 2014 22:30:29 -0000
@@ -612,6 +612,14 @@
         return;

     /*
+     * If a command is NRE-enabled, this profiler will break it.
+     */
+#if (TCL_MAJOR_VERSION == 8) && (TCL_MINOR_VERSION > 5)
+    if (cmdPtr->nreProc != NULL)
+        return;
+#endif
+
+    /*
      * Save current state information.
      */
     infoPtr->currentCmdPtr = cmdPtr;

dgp added on 2014-08-01 22:22:25:
I got patches brewing.

mi added on 2014-08-01 21:44:25:

Don (dgp) asked about someone from TclX asking us for what they need
Don's request allowed for "someone" to be a user — such as myself.
I did spend the energy required to keep those working, going against my real preference to kill them.
Why didn't you mention it on that old ticket back then?
I was (and still am) willing to help, but not to take over TclX's maintenance.
No, not of maintenance of the entire TclX — just the profile part of it, because it uses the structures and functions on which you are the top-expert in the world.

TclX is a sufficiently widely-used extension, in my opinion, that it should not be ignored by the maintainers of Tcl.

But, if you don't think so, so be it... Maybe, I can figure it out from scratch.


msofer added on 2014-08-01 21:22:01:
Btw if TclX is really "reaching into the Command struct and changing it" instead of using the Tcl_[G|S]etCommandInfo to do its dirty work, then it is ENTIRELY TclX's fault.

I did spend the energy required to keep those working, going against my real preference to kill them.

msofer added on 2014-08-01 21:15:16:
Don (dgp) asked about someone from TclX asking us for what they need, you offer someone asking us to fix TclX. Different thing. No sympathy from me. 

I was (and still am) willing to help, but not to take over TclX's maintenance. To be very clear: your question "could you be troubled to make up a patch for TclX?" effectively means "could you study TclX sources and fix them?". I don't know TclX, I don't want to know, I could be troubled to help if anybody showed any real interest.

Now: if nobody cares to look at what changed in tclInt.h that could impact TclX in 2008, AFTER noticing that it is segfaulting, my diagnose is clear: TclX is abandonware, it works as it always did with unmaintained Tcl, and nobody really seems to care enough to help me help them.

mi added on 2014-08-01 21:02:18:

Show me a single TIP where the developers of TclX or its users have spelled out what access they would need to work better and I'd find some more sympathy.
Here you go: http://core.tcl.tk/tcl/tktview?name=2017162

My nickname at SourceForge, where this ticket was filed six years ago, is "kot" :-) I followed-up on it asking for help in 2010 -- and forgot about this ticket completely myself...

Can I have some sympathy now?


mi added on 2014-08-01 20:51:53:

I had it wrong before, the real transgression is reacing into (Command) structs and overwriting their command procedures *and* clientData!

I see... Would you be able to offer a correction? I'm just trying to update the FreeBSD port of tclX — and can not speak for tclX-developers. FreeBSD switched to using tcl-8.6 by default long ago, it would be nice, if tclX didn't require the older version to function...

Thanks!


mi added on 2014-08-01 20:44:12:

Actually, TclX does not use its own Interp, which could've been out of sync with Tcl's internal type. TclX includes tclInt.h, so I doubt, that is the problem. Here is, what makes me think, the problem is in Tcl. Here is the call to Tcl_CreateTrace in tclXprofile.c:

713         infoPtr->traceHandle =
714             Tcl_CreateTrace (infoPtr->interp, MAXINT,
715                              (Tcl_CmdTraceProc *) ProfTraceRoutine,
716                              (ClientData) infoPtr);
And here is the value of the infoPtr, that's passed as the clientData:
(gdb) p infoPtr
$1 = (profInfo_t *) 0x8018fdb10
(gdb) c
Continuing.
And here is the crash, notice the value of the clientData given to PushProcCallFrame:
Program received signal SIGSEGV, Segmentation fault.
0x000000080098d69b in PushProcCallFrame (clientData=0x8018fdb10, interp=0x8018db010, objc=1, objv=0x8018df2f0, isLambda=0)
    at /home/ports/lang/tcl86/work/tcl8.6.1/generic/tclProc.c:1602
1602        Namespace *nsPtr = procPtr->cmdPtr->nsPtr;


dgp added on 2014-08-01 20:39:12:
Regarding the lack of exposure to needed internals...

Tcl development has been guided by open proposals for
nearly 14 years now.

Show me a single TIP where the developers of TclX
or its users have spelled out what access they would
need to work better and I'd find some more sympathy.

dgp added on 2014-08-01 20:37:03:
I *am* helping.  Poring over the TclX code to
see where they've broken through the barriers.

I had it wrong before, the real transgression
is reacing into (Command) structs and overwriting
their command procedures *and* clientData!

This overwriting takes no account of the new
nreProc field or the fact that most command
use that now, while still using the same clientData.

So TclX overwrites a commands clientData, but doesn't
intercept the command procedure invocation -- BOOM!

mi added on 2014-08-01 20:29:32:

TclX must adapt to that or die.
Could you help? It is not entirely TclX' fault — for many years Tcl core was overly zealous in hiding its various internals, forcing extensions to make unsafe assumptions like this...


dgp added on 2014-08-01 20:04:58:
and what have we here in tclXprofile.c ....

TurnOnProfiling (infoPtr, commandMode, evalMode)
    profInfo_t *infoPtr;
    int         commandMode;
    int         evalMode;
{
...
    InitializeProcStack (infoPtr, ((Interp *) infoPtr->interp)->framePtr);
}

The Tcl 8.6 interp, and frame and all manner of internals are
not the same as those of 8.4.  TclX must adapt to that or die.

mi added on 2014-08-01 20:00:11:
Well, I ran this under debugger. The value ClientData, that TclX' code passes to Tcl_CreateTrace() is exactly the same pointer, that the PushProcCallFrame treats as the pointer to Proc.

To reproduce, build TclX 8.4 (or 8.4.1) against Tcl-8.6 and attempt to "make test".

See crash in profile.test

dgp added on 2014-08-01 19:51:21:
...and the Q is what horrors has TclX done to make
a non-proc command like [set] Dispatch to the TclNRInterpProc ?

dgp added on 2014-08-01 19:48:10:
With the CVS HEAD of TclX from sourceforge
and the tip of core-8-6-2-rc Tcl:

% package require Tclx
8.4
% profile on
% set a 1

Program received signal SIGSEGV, Segmentation fault.
0x000000000055241f in PushProcCallFrame (clientData=0x924710, interp=0x821680,
    objc=3, objv=0x7fffffffd9e0, isLambda=0)
    at /home/dgp/fossil/tcl8.6.2/generic/tclProc.c:1602
1602        Namespace *nsPtr = procPtr->cmdPtr->nsPtr;
(gdb) bt
#0  0x000000000055241f in PushProcCallFrame (clientData=0x924710,
    interp=0x821680, objc=3, objv=0x7fffffffd9e0, isLambda=0)
    at /home/dgp/fossil/tcl8.6.2/generic/tclProc.c:1602
#1  0x0000000000552614 in TclNRInterpProc (clientData=0x924710,
    interp=0x821680, objc=3, objv=0x7fffffffd9e0)
    at /home/dgp/fossil/tcl8.6.2/generic/tclProc.c:1712
#2  0x0000000000414dd7 in Dispatch (data=0x86abf8, interp=0x821680, result=0)
    at /home/dgp/fossil/tcl8.6.2/generic/tclBasic.c:4357
#3  0x0000000000414e5e in TclNRRunCallbacks (interp=0x821680, result=0,
    rootPtr=0x0) at /home/dgp/fossil/tcl8.6.2/generic/tclBasic.c:4390
#4  0x0000000000414584 in Tcl_EvalObjv (interp=0x821680, objc=3,
    objv=0x7fffffffd9e0, flags=131072)
    at /home/dgp/fossil/tcl8.6.2/generic/tclBasic.c:4121
#5  0x00000000005c2ddd in Tcl_RecordAndEvalObj (interp=0x821680,
    cmdPtr=0x866840, flags=131072)
    at /home/dgp/fossil/tcl8.6.2/generic/tclHistory.c:172
#6  0x00000000005342ff in Tcl_MainEx (argc=-1, argv=0x7fffffffdce0,
    appInitProc=0x40f1c9 <Tcl_AppInit>, interp=0x821680)
    at /home/dgp/fossil/tcl8.6.2/generic/tclMain.c:534
#7  0x000000000040f1c2 in main (argc=1, argv=0x7fffffffdcd8)
    at /home/dgp/fossil/tcl8.6.2/unix/tclAppInit.c:84

msofer added on 2014-08-01 19:15:19:
Wow - completely misdiagnosed, I would say. PushProcCallFrame treats it's own clientData as a Proc - which is the only kind of clientData it should receive: it is a static function that is used in the dispatch of procs (and lambdas). 

The real question is: how is it that the clientData supplied to Tcl_CreateTrace is used to try and dispatch it as if it were a Proc? Who is doing that, how and why?

dgp added on 2014-08-01 19:08:30:
This is a dup of http://core.tcl.tk/tcl/tktview?name=2017162 ?

dgp added on 2014-08-01 18:58:47:
Do you have any demo code or script for this?


Using it can `fossil bisect` locate the origin
of the breakage?

Attachments: