Tcl Source Code

View Ticket
Login
Ticket UUID: 1036064
Title: TCL crashes if the application runs for a long time
Type: Bug Version: final: 8.3.5
Submitter: nobody Created on: 2004-09-28 10:21:45
Subsystem: 01. Notifier Assigned To: kennykb
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2004-09-28 21:08:26
Resolution: Invalid Closed By: kennykb
    Closed on: 2004-09-28 14:08:26
Description:
OS Platform and Version :
W2K 

Problem Behaviour:
We have an application built using Tcl/Tk V8.3.5. This 
application will be typically used for a long duration (say 
12 to 60 hrs). During such a prolonged usage the 
application crashes and a Dr.Watson dump is generated 
(the log is attached). On the first analysis of the log I 
could figure out that the crash has occured in the 
function TclpStrftime, which will typically be invoked by 
the usage of the Tcl command [clock clicks -
milliseconds]. The crash does not happen during the 
short duration of the usage of the tool.

Expected Behaviour:
The application should not crash

Contact email id : [email protected]
User Comments: kennykb added on 2004-09-28 21:08:26:
Logged In: YES 
user_id=99768

There are multiple sources of confusion in this bug report;
let me try to untangle a few of them, or else the
explanation will appear wholly unrelated.

First, despite the indications in DrWatson.log, the crash
did *not* occur in or around TclpStrftime. Rather,
TclpStrftime was the last exported name before the code in
question. (This fact is not surprising; it's the last
exported name in the Tcl library.)

The code that faulted was, in actuality, a bit of
generated code, in another segment, that handles probing the
large activation record of TclRegExec (the 'exec' function
in generic/regexec.c).  The stack probes went below the base
of the stack segment at 0x34000 and faulted.  This is a
usual behaviour of most software confronted with a stack
overflow.

Tcl contains logic to make stack overflows more benign, in
the function TclpCheckStackSpace in TclWin32Dll.c.
Unfortunately, in the release you're using, the stack
commitment that TclpCheckStackSpace imposes is not enough to
handle the demands of TclRegExec (whose activation record is
extremely large).  This problem is fixed in release 8.4.7;
see
   
http://sourceforge.net/tracker/index.php?func=detail&aid=947070&group_id=10894&atid=110894
and
   
http://cvs.sourceforge.net/viewcvs.py/tcl/tcl/win/tclWinInt.h?r1=1.20.2.2&r2=1.20.2.3
for the details.

The bad news is that this change will, in the log that I'm
seeing, apparently just convert a crash to a Tcl error; the
stack will still have overflowed.  Unlike the case with most
stack overflows, I'm not seeing tremendously deep recursive
invocations of Tcl code.  Rather, I'm observing that there's
an unusually large amount of stack in use prior to a call to
Tcl_DoOneEvent in or near a procedure named Q_Init (which is
not part of Tcl, so I can't comment on it).  I suspect
several possibilities here:

(1) It's possible that Q_Init (or something called from it)
    is leaking memory that is allocated with the 'alloca'
    library call; 'alloca' allocates memory by expanding the
    activation record.  Eventually, there isn't enough stack
    space left to run the event handler, and the process
    crashes.

(2) Another possibility is that Q_Init calls a deeply
    recursive nest of functions, each of which is compiled
    with frame pointers omitted.  Since the DLL in question
    has no symbol information, DrWtsn32 can't trace calls
    through it.

(3) Yet another possibility is that an event handler in C
    (again, compiled with frame pointers omitted) is
    invoking Tcl_DoOneEvent (or invoking Tcl code that calls
    [update] or [vwait]) and Tcl_DoOneEvent finds another
    event pending.  The second event in turn also does
    Tcl_DoOneEvent in its event handler, and so on.
    Eventually, there are enough unfinished event handlers
    stacked that the process crashes.  If this is the case,
    the most likely cause is that something does [after
    idle] or Tcl_DoWhenIdle from an idle handler - the
    documentation remarks that doing so is not safe.

Since the stack exhaustion appears to be the result of
Tcl_DoOneEvent being entered with inadequate stack space
remaining, rather than any inherent fault in the Tcl library
itself, I'm closing this bug.  If you need further help
tracking things down, I'd suggest visiting
    http://mini.net/cgi-bin/chat.cgi
and talking to the Tcl developers there.

nobody added on 2004-09-28 17:21:46:

File Added - 103016: DrWtsnLog_sim19.txt

Attachments: