Ticket UUID: | 718045 | |||
Title: | Closing transferred channel crashes app | |||
Type: | Bug | Version: | obsolete: 8.4.2 | |
Submitter: | kenj | Created on: | 2003-04-09 07:34:13 | |
Subsystem: | 49. Threading | Assigned To: | das | |
Priority: | 9 Immediate | Severity: | ||
Status: | Closed | Last Modified: | 2003-07-19 03:20:40 | |
Resolution: | Fixed | Closed By: | hobbs | |
Closed on: | 2003-07-18 20:20:40 | |||
Description: |
I'm attaching a script that crashes each time I run it on my system. It's an echo server translated from a single- threaded to a multi-threaded implementation. As written, clients can connect to the server and successfully receive echoes of messages they send. (I've tested it using simple telnet connections.) However, when any client disconnects -- either by closing the client or by issuing the "quit" message -- the server crashes with the following message: FlushChannel: damaged channel list abnormal program termination There was another problem I noticed with this script, that perhaps should be filed as a separate bug. You'll note a few lines near the bottom commented out, which tried to send a "welcome" message after the client connected. The code would send the message successfully, but then the readable fileevent seemed to never fire within the worker thread. Unfortunately, as I'm running this on Windows 2000, I'm not sure what additional debugging information I can provide. As far as the environment, the operating system is Microsoft Windows 2000, 5.00.2195, Service Pack 3. I'm running the script from a Command Prompt window by executing "tclsh echoserv3.tcl". The tclsh is version 8.4.2, compiled on my system directly from the source, with threading enabled. The Threads version is 2.5.1, also compiled on my machine. I compiled both using the Mingw tools. The beginning of the README.txt file says: "This is msys_mingw release #2, it bundles Msys 1.0.8 and Mingw 1.1." gcc --version returns 2.95.3-6. | |||
User Comments: |
hobbs added on 2003-07-19 03:20:40:
Logged In: YES user_id=72656 moved to pending since we haven't heard - assuming functional on Mac. andreas_kupries added on 2003-04-23 06:24:13: Logged In: YES user_id=75003 Reassigning to Daniel for test of Mac changes. andreas_kupries added on 2003-04-23 02:48:09: Logged In: YES user_id=75003 See also [ 593810 ] Channel Transfer crashes. andreas_kupries added on 2003-04-23 02:45:54: Logged In: YES user_id=75003 Looking around in the socket code ... (1) The windows socket system maintains its own list of sockets per thread ... This list is not maintained by the cut/splice operations BAD !! (2) ... David is also correct that the cut/splice operations never send SOCKET_SELECT messages, which change the WSAAsyncSelect status of the socket, and the association with a particular notifier HWND. And this explain s the problem. The various levels have inconsistent info on where the channel is. The upper layers have it transfered, but in the lower layers the channel is still seeen in the main thread, especially notification events arrive in the main thread HWND, and not in the other thread. This then causes the event to be handled in the main thread and from then on we know how that crashes. Note: Mo Dejongs latests changes in this area are an approximation to get this done, by introducing platform specific cut/splice ops. This is good, but does not reach far enough. We have essentially per channel-type/platform specific ops to manage. I have to extend my proto TIP about channel type initialization during channel transfer and add channel type specific cut/splice ops to handle this. Can't be done in the 8.4.3. timeframe IMHO anymore. First should be 8.5, eventually a backport. andreas_kupries added on 2003-04-23 02:37:30: Logged In: YES user_id=75003 Comments by David in mail to me: My first guess would be SocketProc() calling Tcl_ThreadAlert () in win/tclWinSock.c probably because the socket wasn't removed from the handler window with WSAAsyncSelect(). During a transfer, does the generic layer remove all watch masks and restore them on the other side? andreas_kupries added on 2003-04-22 08:02:59: File Added - 48369: stacktrace-718045.txt andreas_kupries added on 2003-04-22 08:02:58: Logged In: YES user_id=75003 Attaching my stacktrace. andreas_kupries added on 2003-04-22 08:00:40: Logged In: YES user_id=75003 Passing on to DavyGrvy. I don't understand the Windows notifier good enough. I asked Jeff, but he doesn't either. andreas_kupries added on 2003-04-22 07:56:12: Logged In: YES user_id=75003 Further looking into the stacktrace it seems as if the fileevent handler script is called from the global eventloop (vwait::forever), and _not_ from the thread-local thread::wait. So while the call happens we are in the main thread, and that causes the confusion in the lower levels. I am quite sure, if I add asserts checking in channel operations current thread against managing thread then I will get a panic much earlier. This goes deeper into the notifier on windows. Arrgh. andreas_kupries added on 2003-04-22 07:40:01: Logged In: YES user_id=75003 The system is confused about the threads ... The channel we are close is managed by thread 0x660 (See statePtr- >managingThread, when in Tcl_CutChannel). However the function itself gets the tsdPtr for thread 0x3bc (Determined by looking at the managingThread item in the channel list so found). So, we get the channel list for a thread different from the thread actually managing the channel. Therefore the channel is not found in the list, and tcl panics. That is what we see. ______ Regarding help you (Ken) can provide ... Just compile Tcl and Thread with --enable-symbols and the crash allows you to directly jump into the MSVC++ debugger and inspect variables, stack, etc. andreas_kupries added on 2003-04-22 07:27:35: Logged In: YES user_id=75003 Confirmed on Win/2k box here, for revision Jan 24. This means that the changes done by Mo have nothing to do with the problem. This is something in the original code. andreas_kupries added on 2003-04-22 07:19:07: Logged In: YES user_id=75003 Does not happen on Linux, tested for Tcl head (8.5a0), and revisions at Jan 27 and Jan 24. kenj added on 2003-04-16 06:04:42: Logged In: YES user_id=260635 I'll check out these cases in a couple of days. Right now, I've got a tight deadline on a project preventing me from spending much time on this. kenj added on 2003-04-16 06:03:19: Logged In: YES user_id=260635 I'll check out these cases in a couple of days. Right now, I've got a tight deadline on a project preventing me from spending much time on this. andreas_kupries added on 2003-04-15 02:12:09: Logged In: YES user_id=75003 I notice in the ChangeLog a change by Mo DeJong on Jan 25, 2003 in that area. Here is the entry: +2003-01-25 Mo DeJong <[email protected]> + +* generic/tclIO.c (Tcl_CutChannel, Tcl_SpliceChannel): +Invoke TclpCutFileChannel and TclpSpliceFileChannel. +* generic/tclInt.h: Declare TclpCutFileChannel +and TclpSpliceFileChannel. +* unix/tclUnixChan.c (FileCloseProc, TclpOpenFileChannel, +Tcl_MakeFileChannel, TclpCutFileChannel, +TclpSpliceFileChannel): Implement thread load data +cut and splice for file channels. This avoids +an invalid memory ref when compiled with - DDEPRECATED. +* win/tclWinChan.c (FileCloseProc, TclpCutFileChannel, +TclpSpliceFileChannel): Implement thread load data +cut and splice for file channels. This avoids +an invalid memory ref that was showing up in the +thread extension. Can you test if the problem you encounter happens with a Tcl from before that change too ? andreas_kupries added on 2003-04-09 23:40:56: Logged In: YES user_id=75003 Does this happen on a unix system too ? Does it happen when compiled with MS VC++ too ? (That is the environment I have). kenj added on 2003-04-09 14:34:14: File Added - 47207: echoserv3.tcl |