Tcl Source Code

View Ticket
Login
Ticket UUID: 718045
Title: Closing transferred channel crashes app
Type: Bug Version: obsolete: 8.4.2
Submitter: kenj Created on: 2003-04-09 07:34:13
Subsystem: 49. Threading Assigned To: das
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2003-07-19 03:20:40
Resolution: Fixed Closed By: hobbs
    Closed on: 2003-07-18 20:20:40
Description:
I'm attaching a script that crashes each time I run it on 
my system. It's an echo server translated from a single-
threaded to a multi-threaded implementation.

As written, clients can connect to the server and 
successfully receive echoes of messages they send. 
(I've tested it using simple telnet connections.) However, 
when any client disconnects -- either by closing the 
client or by issuing the "quit" message -- the server 
crashes with the following message:

FlushChannel: damaged channel list

abnormal program termination

There was another problem I noticed with this script, 
that perhaps should be filed as a separate bug. You'll 
note a few lines near the bottom commented out, which 
tried to send a "welcome" message after the client 
connected. The code would send the message 
successfully, but then the readable fileevent seemed to 
never fire within the worker thread.

Unfortunately, as I'm running this on Windows 2000, I'm 
not sure what additional debugging information I can 
provide. As far as the environment, the operating system 
is Microsoft Windows 2000, 5.00.2195, Service Pack 3. 
I'm running the script from a Command Prompt window 
by executing "tclsh echoserv3.tcl".

The tclsh is version 8.4.2, compiled on my system 
directly from the source, with threading enabled. The 
Threads version is 2.5.1, also compiled on my machine.

I compiled both using the Mingw tools. The beginning of 
the README.txt file says: "This is msys_mingw release 
#2, it bundles Msys 1.0.8 and Mingw 1.1." gcc --version 
returns 2.95.3-6.
User Comments: hobbs added on 2003-07-19 03:20:40:
Logged In: YES 
user_id=72656

moved to pending since we haven't heard - assuming 
functional on Mac.

andreas_kupries added on 2003-04-23 06:24:13:
Logged In: YES 
user_id=75003

Reassigning to Daniel for test of Mac changes.

andreas_kupries added on 2003-04-23 02:48:09:
Logged In: YES 
user_id=75003

See also [ 593810 ] Channel Transfer crashes.

andreas_kupries added on 2003-04-23 02:45:54:
Logged In: YES 
user_id=75003

Looking around in the socket code ...

(1) The windows socket system maintains its own list of 
sockets per thread ... This list is not maintained by the 
cut/splice operations BAD !! 

(2) ... David is also correct that the cut/splice operations 
never send SOCKET_SELECT messages, which change the 
WSAAsyncSelect status of the socket, and the association 
with a particular notifier HWND.

And this explain s the problem. The various levels have 
inconsistent info on where the channel is. The upper layers 
have it transfered, but in the lower layers the channel is still 
seeen in the main thread, especially notification events arrive 
in the main thread HWND, and not in the other thread. This 
then causes the event to be handled in the main thread and 
from then on we know how that crashes.

Note: Mo Dejongs latests changes in this area are an 
approximation to get this done, by introducing platform 
specific cut/splice ops. This is good, but does not reach far 
enough.

We have essentially per channel-type/platform specific ops to 
manage.

I have to extend my proto TIP about channel type initialization 
during channel transfer and add channel type specific 
cut/splice ops to handle this.

Can't be done in the 8.4.3. timeframe IMHO anymore. First 
should be 8.5, eventually a backport.

andreas_kupries added on 2003-04-23 02:37:30:
Logged In: YES 
user_id=75003

Comments by David in mail to me:

My first guess would be SocketProc() calling Tcl_ThreadAlert
() in
win/tclWinSock.c probably because the socket wasn't 
removed from the handler
window with WSAAsyncSelect().

During a transfer, does the generic layer remove all watch 
masks and restore
them on the other side?

andreas_kupries added on 2003-04-22 08:02:59:

File Added - 48369: stacktrace-718045.txt

andreas_kupries added on 2003-04-22 08:02:58:
Logged In: YES 
user_id=75003

Attaching my stacktrace.

andreas_kupries added on 2003-04-22 08:00:40:
Logged In: YES 
user_id=75003

Passing on to DavyGrvy. I don't understand the Windows 
notifier good enough. I asked Jeff, but he doesn't either.

andreas_kupries added on 2003-04-22 07:56:12:
Logged In: YES 
user_id=75003

Further looking into the stacktrace it seems as if the fileevent 
handler script is called from the global eventloop 
(vwait::forever), and _not_ from the thread-local thread::wait.
So while the call happens we are in the main thread, and that 
causes the confusion in the lower levels.

I am quite sure, if I add asserts checking in channel 
operations current thread against managing thread then I will 
get a panic much earlier.

This goes deeper into the notifier on windows. Arrgh.

andreas_kupries added on 2003-04-22 07:40:01:
Logged In: YES 
user_id=75003

The system is confused about the threads ... The channel we 
are close is managed by thread 0x660 (See statePtr-
>managingThread, when in Tcl_CutChannel). However the 
function itself gets the tsdPtr for thread 0x3bc (Determined by 
looking at the managingThread item in the channel list so 
found).

So, we get the channel list for a thread different from the 
thread actually managing the channel. Therefore the channel 
is not found in the list, and tcl panics. That is what we see.

______
Regarding help you (Ken) can provide ... Just compile Tcl and 
Thread with --enable-symbols and the crash allows you to 
directly jump into the MSVC++ debugger and inspect 
variables, stack, etc.

andreas_kupries added on 2003-04-22 07:27:35:
Logged In: YES 
user_id=75003

Confirmed on Win/2k box here, for revision Jan 24. This 
means that the changes done by Mo have nothing to do with 
the problem. This is something in the original code.

andreas_kupries added on 2003-04-22 07:19:07:
Logged In: YES 
user_id=75003

Does not happen on Linux, tested for Tcl head (8.5a0),  and 
revisions at Jan 27 and Jan 24.

kenj added on 2003-04-16 06:04:42:
Logged In: YES 
user_id=260635

I'll check out these cases in a couple of days. Right now, I've 
got a tight deadline on a project preventing me from spending 
much time on this.

kenj added on 2003-04-16 06:03:19:
Logged In: YES 
user_id=260635

I'll check out these cases in a couple of days. Right now, I've 
got a tight deadline on a project preventing me from spending 
much time on this.

andreas_kupries added on 2003-04-15 02:12:09:
Logged In: YES 
user_id=75003

I notice in the ChangeLog a change by Mo DeJong on Jan 25, 
2003 in that area. Here is the entry:

+2003-01-25  Mo DeJong  <[email protected]>
+
+* generic/tclIO.c (Tcl_CutChannel, 
Tcl_SpliceChannel):
+Invoke TclpCutFileChannel and 
TclpSpliceFileChannel.
+* generic/tclInt.h: Declare TclpCutFileChannel
+and TclpSpliceFileChannel.
+* unix/tclUnixChan.c (FileCloseProc, 
TclpOpenFileChannel,
+Tcl_MakeFileChannel, TclpCutFileChannel,
+TclpSpliceFileChannel): Implement thread load 
data
+cut and splice for file channels. This avoids
+an invalid memory ref when compiled with -
DDEPRECATED.
+* win/tclWinChan.c (FileCloseProc, 
TclpCutFileChannel,
+TclpSpliceFileChannel): Implement thread load 
data
+cut and splice for file channels. This avoids
+an invalid memory ref that was showing up in the
+thread extension.

Can you test if the problem you encounter happens with a Tcl 
from before that change too ?

andreas_kupries added on 2003-04-09 23:40:56:
Logged In: YES 
user_id=75003

Does this happen on a unix system too ?
Does it happen when compiled with MS VC++ too ?
(That is the environment I have).

kenj added on 2003-04-09 14:34:14:

File Added - 47207: echoserv3.tcl

Attachments: