Tcl Source Code

View Ticket
Login
Ticket UUID: 486399
Title: Panic on exit before all threads returned
Type: Bug Version: obsolete: 8.4a3
Submitter: chrishall Created on: 2001-11-28 09:04:21
Subsystem: 80. Thread Package Assigned To: vasiljevic
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2009-06-18 15:22:33
Resolution: Duplicate Closed By: ferrieux
    Closed on: 2009-06-15 21:52:47
Description:
Solaris 2.7, 2 CPU machine

[belinda 14]--> gcc -v
Reading specs from
/usr/local/lib/gcc-lib/sparc-sun-solaris2.7/2.95.3/specs
gcc version 2.95.3 20010315 (release)

Tcl 8.4a3, built thus:

./configure\
--enable-threads\
--enable-shared\
--enable-gcc\
--prefix=/opt/tcl8.4a3

Thread package built thus:

./configure\
--enable-gcc\
--with-tcl=/opt/tcl8.4a3/lib\
--enable-shared\
--enable-threads\
--prefix=/opt/tcl8.4a3

The attached tar file contains three small scripts. If
you run:

tclsh8.4 boss.tcl

You should be able to get it to do this:
[belinda 66]--> tclsh8.4 boss.tcl
task_done 6
task_done 7
task_done 8
task_done 9
task_done 10
task_done 6
task_done 7
task_done 8
task_done 9
task_done 6
10 exiting...
BOSS: worker_exit 10
8 exiting...
7 exiting...
BOSS: worker_exit 8
BOSS: worker_exit 7
9 exiting...
BOSS: worker_exit 9
6 exiting...
BOSS: worker_exit 6
BOSS: done...
Tcl_Release couldn't find reference for 0x43938
Abort (core dumped)

My stack trace is:

(gdb) where
#0  0xff059968 in __sigprocmask () from
/usr/lib/libthread.so.1
#1  0xff04f1f0 in _resetsig () from
/usr/lib/libthread.so.1
#2  0xff04e93c in _sigon () from
/usr/lib/libthread.so.1
#3  0xff0517b4 in _thrp_kill () from
/usr/lib/libthread.so.1
#4  0xff0b9450 in abort () from /usr/lib/libc.so.1
#5  0xff2ee028 in Tcl_PanicVA () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#6  0xff2ee054 in Tcl_Panic () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#7  0xff2f57f0 in Tcl_Release () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#8  0xfe7c347c in ThreadEventProc ()
   from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so
#9  0xff2ec1e0 in Tcl_ServiceEvent () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#10 0xff2ec538 in Tcl_DoOneEvent () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#11 0xfe7c3148 in ThreadWait ()
   from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so
#12 0xfe7c1b28 in ThreadWaitObjCmd ()
   from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so
#13 0xff2eebec in EvalObjv () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#14 0xff2ef230 in Tcl_EvalEx () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#15 0xff2e33dc in Tcl_FSEvalFile () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#16 0xff2b51ac in Tcl_SourceObjCmd () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#17 0xff2eebec in EvalObjv () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#18 0xff2ef230 in Tcl_EvalEx () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#19 0xff2ef48c in Tcl_Eval () from
/opt/tcl8.4a3/lib/libtcl8.4.so
#20 0xfe7c2238 in NewThread ()
   from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so
User Comments: dkf added on 2009-06-18 15:22:33:

allow_comments - 1

ferrieux added on 2009-06-16 04:52:47:
This is handled in 2001201.
Marking this one as dup.

ferrieux added on 2009-05-07 16:29:15:
Well, the idea as explained below, is to let [exit] share only 10% of the full-finalization path.Hence the remaining 90% don't need any work. The question is rather what to let into those 10%. I believe the early exit handlers should be let in, because they allow e.g. a DB extension to close transactions etc. For the IO subsystem I'm starting to think it should be left out because if a thread/interp owns a channel and is blocked on something else, we cannot safely close it from a central point. And anyway exit() closes fd's.

So, do you see a problem if we restrict Tcl_Exit's operation to just the early exit handler loop (beginning of Finalize) ?

vasiljevic added on 2009-05-07 05:01:54:
Hi!

I am afraid you will not be able to "rectify" this w/o rewriting the 
complete finalization logic introducing ref-counts and similar
tricks. Furthermore, no thread in the application is a "main" thread.
Every thread is equal. So long ANY thread is using Tcl datastructures
they cannot/shouldnot be freed, regardles if this is the "first" thread
started in the process or any other. 

I tried to do that (not only myself, but also some AOLserver developers)
just to find out that it is a huge and complex task, perhaps too complex
to justify the benefits. Anyway, by adhering to the one simple rule:
never exit while there are other threads in the process.
one can live with this pretty fine. If you want to make it "by the book"
you are welcome to try, but be prepared to sweat hard.

Cheers
Zoran

ferrieux added on 2009-05-07 03:37:30:
Note that GPS's patch in 2001201 applies exactly this strategy. But I would still like to get answers to the questions raised below.

ferrieux added on 2009-05-07 03:27:28:
Here's a copy of a message I sent to tclcore regarding these issues:

fromAlexandre Ferrieux <[email protected]>
toTclCore <[email protected]>
dateWed, May 6, 2009 at 12:42 AM
subjectFinalization vs. Exit
mailed-bygmail.com
hide details 12:42 AM (21 hours ago)

Reply

Hi,

As exhibited e.g. in
https://sourceforge.net/tracker/?func=detail&aid=486399&group_id=10894&atid=110894
, the Tcl finalization sequence is a complex thing.

In this specific bug, the problem involves avoiding bad interaction of
second-class threads with the main (first-class) one during the very
last steps of teardown, when the ground starts to dissolve under
everybody's feet.
In other instances, it can be argued that we're doing way too much
"administrative" cleanup (like freeing memory) just before exiting.

Now a simple approach seems to be viable: make a clear distinction
between the [exit] path and full finalization (eg in embedded
scenarios, since Donal dislikes scenarii ;-).

The idea, then, is to make clearer in the code the dichotomy between
the exit handlers that are "administrative" (meaning they have no
effect outside the dying process) and can thus be skipped in the
[exit] case, and those that are really compulsory in all cases because
they have side-effects on the outside world, these side-effects being
part of a documented or implicit contract that we simply cannot break.
Then, for the [exit] path, just do the compulsory ones, and call the
OS's exit() function.

Question: could you help me draw this dichotomy ?

Here is what I have spotted so far:

  - process-wide exit handlers registered with Tcl_CreateExitHandler,
aka "early exit handlers" --> compulsory

  - per-thread exit handlers (freeing memory) --> skippable for [exit]

Still in the gray zone, is FinalizeIOSubsystem. I know of cases where
not calling it might have long-ranging effects (like RST sent on all
non-closed sockets), but since it deals with per-thread/interp
structures (like channel lists), it should either be entirely skipped
or done for all threads... which is problematic at exit time if some
threads are blocked or in an uncontrolled state.

Thanks in advance for any insight on this,

-Alex

ferrieux added on 2009-05-05 00:33:41:
Upping the prio because it aborts on very simple scripts even on Windows (where threading is not so alien ;-). Will look a this shortly in the light of recent work on exit handlers.
The current abort message on Win32/mingw is:

 exit handlers were created during Tcl_Finalizecalled Tcl_FindHashEntry on deleted table

 This application has requested the Runtime to terminate it in an unusual way.
 Please contact the application's support team for more information.

msofer added on 2006-03-15 17:06:49:
Logged In: YES 
user_id=148712

Bug #597997 (closed as a dup of this one) has a small test
script to trigger the panic:

    package require Thread
    set i 0
    while {[incr i] <= 2} {
        thread::create "puts $i"
    }

Avoid the panic by adding the following at the end:

    # make sure to wait until all threads have returned
    while {[llength [thread::names]] > 1} {
        after 100 set x 1
        vwait x
    }

vasiljevic added on 2006-03-10 20:02:05:
Logged In: YES 
user_id=95086

Yes. This is still valid.

The problem is that by the concept of the Tcl lib, there is a
implicit distinction between the main thread (the startup 
thread) doing the Tcl_Exit and other threads doing Tcl_ThreadExit. 

Actaully the entire cleanup/teardown is not thread-friendly and 
relies on the fact that startup thread must exit last, which is 
not always true. Rather, the thread which exits AND it is the
LAST thread in the process must initiate teardown. This requires
quite a lot of plumbing here and there and it is questionable if
it is worth the effort in the 8.x branch.

I could imagine closing this bug, yet opening another RFE to
make the Tcl finalization more thread-compatible.

dgp added on 2006-03-10 12:31:15:
Logged In: YES 
user_id=80530


Is this still valid?

Much revision of finalization
has been done since this
was reported.

andreas_kupries added on 2002-08-22 04:24:09:
Logged In: YES 
user_id=75003

See also
* [ 597997 ] async+thread panic
* [ 597575 ] [exit] in sub-thread may crash.

andreas_kupries added on 2002-01-12 04:33:08:
Logged In: YES 
user_id=75003

I am not satisified with the proposed solution of using 
TclInExit (as public API) to avoid the problem. Threads are 
usual unwound (thread::unwind) from nested event loops and 
can process arbitrary commands before actually exiting. 
This means that this code may use any global subsystem of 
the tcl-library, including the Preserve-subsystem, and not 
only by calling Tcl_Release, but Tcl_Preserve as well. As 
all these global susbsystems are effectively already 
shutdown anything may happen, most likely crashes in other 
places. And TclInExit will tide us over these only so far.

The main problem I see exposed through this bug is IMHO 
that the main thread is special. Shutting down a thread 
created by the Thread package finalizes this thread and all 
data pertaining to it. But shutting down the main thread 
additionally finalizes all global information shared across 
all the existing threads, thus rendering all other threads 
unable to run.

So, what possible solutions do we have to this ?

1) Have the thread package register a global exit handler. 
As these are called before everything else this will allow 
the thread package to simply kill all threads still running 
(This would also be the approved and public method of 
getting the information that we are in exiting the 
process). The downside would be that the killed threads may 
not have completed their actions. In the specific case of 
this bug it would be most likely that the logger thread is 
killed and unable to print the log messages of the last-
closing worker-thread. But this is something which can be 
worked around.

2) Find a way to block the finalization of the main thread 
and global information until after all the other threads 
are finalized. The downside is that an improperly coded 
application may simply hang on exit because of one or more 
threads not shutting down, or not shut down.

Of the solutions above I prefer the first one as its 
downside is less troubling than the downside of the second 
to me. If someone else sees other solutions I would be 
happy to hear about them.

andreas_kupries added on 2002-01-12 04:10:11:
Logged In: YES 
user_id=75003

The main thread breaks out of its event loop only after all 
worker threads have signaled that they are done. As they 
still have to talk to the logger thread this means that the 
main thread can exit while one worker thread is in the 
later stages of cleanup. Even more so the logger thread is 
told to terminate just before the main thread truly exits. 
I guess that it is the logger thread which tries to release 
his interp after PreserveExitProc was called.

andreas_kupries added on 2002-01-12 03:35:10:
Logged In: YES 
user_id=75003

Zoran,

There is an internal function "TclInExit" (tclEvent.c, line 
939). I don't know if we can use this one. Looking at its 
code I see that it looks for thread-specific data first 
before returning global information.

vasiljevic added on 2001-12-07 14:12:41:
Logged In: YES 
user_id=95086

Andreas,

I've invested couple of minutes in this issue. 
It is pretty simple. The main application (tclsh) thread
exits before one of the worker threads gets the
chance to exit. The main app thread calls PreserveExitProc
which finalizes the preserved array and the exiting thread
bombs since its preserved data chunk is not found.

This is a Tcl core problem, not the thread extension
problem.

Whole scenario can be avoided by simply putting the 
"after 1000" at the end of the application to allow worker 
threads to properly clean-up and terminate. 

The question is: how can such scenario be avoided ?
Is there something like Tcl_InExit which could be
checked by *any* thread and just abort any processing
if the application is about to (or in the middle of) exit? 

Zoran

vasiljevic added on 2001-12-07 12:16:22:
Logged In: YES 
user_id=95086

Andreas,

thanks for offering help. At the moment I'm *realy* busy
so if you can jump in on this, it would be great.
BTW, I expect to close most of the open stuff, related to
docs, etc towards end of the year. I can also attend this
problem in the same go, but if you can do it earlier,
it would be better.

andreas_kupries added on 2001-12-07 04:08:47:
Logged In: YES 
user_id=75003

David is more concerned with expect I believe.
I am giving this to Zoran now. Zoran, if you
don't have time assign it back to me.

dgp added on 2001-11-29 14:16:15:
Logged In: YES 
user_id=80530

Looks like a bad call to Tcl_Release in the thread 
package.  Is davygrvy still the maintainer of that
package, or are his days full with the new Expect
port these days?

chrishall added on 2001-11-28 16:04:25:

File Added - 13886: files.tar

Attachments: