Tcl Source Code

View Ticket
Login
Ticket UUID: 593810
Title: Channel Transfer crashes
Type: Bug Version: obsolete: 8.4b1
Submitter: nobody Created on: 2002-08-11 21:30:16
Subsystem: 49. Threading Assigned To: das
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2003-07-19 03:23:15
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2003-07-18 20:23:15
Description:
Hello, 

I am writing a concurrent Tcl server using threads. I have 
thread enabled Tcl core and thread extension 2.4. I am 
doing:

1. Open a socket and create a thread .

2. transfer socket to the created thread.

3. Send "puts sockid abc; flush sockid" script to the 
created thread. It crashes. 

I do not know why? Could you please look into the 
problem and write me back.

Here is my code

% socket localhost 35000

sock460

% pwd

% load thread.dll

% ::thread::create

1304

% ::thread::transfer 1304 sock460

% ::thread::send  1304  "puts sock460 line; flush 
sock460"

Thanks

yasar
User Comments: andreas_kupries added on 2003-07-19 03:21:46:
Logged In: YES 
user_id=75003

Daniel, did you test the changes ?

hobbs added on 2003-07-19 03:21:10:
Logged In: YES 
user_id=72656

moved to pending since we haven't heard - assuming 
functional on Mac.

andreas_kupries added on 2003-04-23 06:24:32:
Logged In: YES 
user_id=75003

Reassigning to Daniel for test of Mac changes.

andreas_kupries added on 2003-04-23 02:48:43:
Logged In: YES 
user_id=75003

See also [ 718045 ] Closing transferred channel crashes app.

davygrvy added on 2002-11-11 03:42:21:

File Added - 35071: patch.txt

Logged In: YES 
user_id=7549

Andreas:
>The true fix however is to extend the channel driver with an
>init-function which can be used by channels during
>registration in an interp to ensure that their driver is initialized
>in the thread of said interp.

Yes, exactly.  Does this issue exist in the other channel 
types, too?  Should we generalize another another entry in 
the Tcl_ChannelType struct just for this purpose?

See uploaded patch file for my idea in code.

davygrvy added on 2002-11-11 01:30:32:
Logged In: YES 
user_id=7549

*** generic/tclIO.c30 Jul 2002 18:36:25 -00001.57
--- generic/tclIO.c10 Nov 2002 10:30:52 -0000
***************
*** 771,776 ****
--- 771,785 ----
      panic("Tcl_RegisterChannel: duplicate channel 
names");
          }
          Tcl_SetHashValue(hPtr, (ClientData) chanPtr);
+ #ifdef __WIN32__
+ if (! strcmp(chanPtr->typePtr->typeName, "tcp")) {
+     /* 
+      * Just in case, force per-thread initialization to 
happen
+      * so the socket event handler thread gets 
created.
+      */
+     TclpHasSockets(NULL);
+ }
+ #endif
      }
      statePtr->refCount++;
  }

That seems to do it, but is rather "bad style".

zoro2 added on 2002-11-09 04:49:23:
Logged In: YES 
user_id=191529

Workaround to the problem Andreas describes is unfortunately
incomplete. Below is a testing script that SHOULD work. It
stops when trying to do a gets on the channel, while when I
use my *sockPtr=0 hack, everything seems to work OK.

package require Thread
set id [thread::create]
thread::send $id {
    close [socket -server puts 0]
}

proc d {sock args} {
    after idle [list d0 $sock]
}
proc d0 {sock} {
    global id
    thread::send $id [list set sock $sock]
    thread::send $id [list set tid [thread::id]]

    thread::transfer $id $sock

    thread::send -async $id {
        puts $sock "HI"
        flush $sock
        thread::send -async $tid [list puts SENTHI]
        puts $sock [gets $sock]
        thread::send -async $tid [list puts SENTLINE]
        flush $sock
        close $sock
        thread::send -async $tid [list puts DONE]
    }
}

socket -server d 12345

set next [thread::create]
thread::send $next [list set tid [thread::id]]
thread::send -async $next {
    package require Thread
    if {[catch {
        after 2000
        set s [socket 127.0.0.1 12345]
        puts $s TEST; flush $s
    } err]} {
        thread::send -async $::tid [list puts "ERROR:
$::errorInfo"]
    }
}

andreas_kupries added on 2002-08-21 02:46:07:
Logged In: YES 
user_id=75003

Found the problem. When a socket is created in a thread the 
socket driver will be initialized for that thread, especially its 
TSD slot.

Call sequence:
    SocketObjCmd => TclpHasSockets => InitSocket

Now if a thread is created and no socket is created nothing is 
iniitialized. The channel transfer then inserts a socket into the 
thread, but this does not run any code to completely initialize 
the driver. Hence the TSD slot is uninitialized and thus the 
crash.

Because of the workaround described above (create and 
destroy a temp socket in the thread before transfering 
sockets) the priority will go down. 

The true fix however is to extend the channel driver with an 
init-function which can be used by channels during 
registration in an interp to ensure that their driver is initialized 
in the thread of said interp.

andreas_kupries added on 2002-08-21 02:13:42:
Logged In: YES 
user_id=75003

New datapoint:
Start thread-enabled tclsh (in MSVC++ debugger), set a 
breakpoint in file tclWinSock, line 1776. This is where the 
core retrieves the tsdPtr for the SendMessage stuff later.

Source non-crashing script, step into the tsd retrieval. The 
ultimate routine is TlsGetValue, presumably provided by 
Windows. I can't step into it. Here things are ok.

Now source the crashing script, do not change interpreters. 
Step into the retrieval again. All arguments etc. are the same 
as before, but now TlsGetValue returns NULL.. The reason is 
unknown. It is suspected that somewhere some memory 
went haywire. Couldn't prove this however. 'memory validate 
on' does not trigger anything before we hit the crash (Yes, I 
used tcl/threads compiled with TCL_MEM_DEBUG).

I declare this a windows specific bug for now, because of 
TlsGetValue, and the info by Don Porter that the attached 
crashing script is running ok on his Linux/Alpha.

andreas_kupries added on 2002-08-21 02:06:45:

File Added - 29445: ttx

andreas_kupries added on 2002-08-21 02:06:18:

File Added - 29444: ttt

andreas_kupries added on 2002-08-21 02:06:17:
Logged In: YES 
user_id=75003

Attaching the scripts I used for testing.

andreas_kupries added on 2002-08-21 00:42:22:
Logged In: YES 
user_id=75003

Confirmed for Win'2K. Stack trace:

TcpOutputProc(void * 0x007d5fc0, const char * 0x008db0d0, 
int 6, int * 0x0150f994) line 1803 + 14 bytes
FlushChannel(Tcl_Interp * 0x00000000, Channel * 
0x007d5f70, int 0) line 2066 + 38 bytes
Tcl_Flush(Tcl_Channel_ * 0x007d5f70) line 5104 + 13 bytes
Tcl_FlushObjCmd(void * 0x00000000, Tcl_Interp * 
0x007d5490, int 2, Tcl_Obj * const * 0x0150fbec) line 194 + 9 
bytes
TclEvalObjvInternal(Tcl_Interp * 0x007d5490, int 2, Tcl_Obj * 
const * 0x0150fbec, const char * 0x007d87f3, int 14, int 0) 
line 3033 + 25 bytes
Tcl_EvalEx(Tcl_Interp * 0x007d5490, const char * 
0x007d87e0, int 33, int 131072) line 3632 + 42 bytes
ThreadSendEval(Tcl_Interp * 0x007d5490, void * 0x007d9610) 
line 1250 + 27 bytes
ThreadEventProc(Tcl_Event * 0x007d89a0, int -3) line 2386 + 
13 bytes
Tcl_ServiceEvent(int -3) line 618 + 11 bytes
Tcl_DoOneEvent(int -3) line 921 + 9 bytes
ThreadWait() line 2189 + 14 bytes
ThreadWaitObjCmd(void * 0x00000000, Tcl_Interp * 
0x007d5490, int 1, Tcl_Obj * const * 0x0150ff08) line 955
TclEvalObjvInternal(Tcl_Interp * 0x007d5490, int 1, Tcl_Obj * 
const * 0x0150ff08, const char * 0x007d9b90, int 12, int 0) 
line 3033 + 25 bytes
Tcl_EvalEx(Tcl_Interp * 0x007d5490, const char * 
0x007d9b90, int 12, int 0) line 3632 + 42 bytes
Tcl_Eval(Tcl_Interp * 0x007d5490, const char * 0x007d9b90) 
line 3796 + 17 bytes
NewThread(void * 0x0012f640) line 1472 + 23 bytes
KERNEL32! 77e8758a()

Dereferencing a NULL pointer in TcpOutputProc.
tsdPtr is NULL. tsd = Thread-specific Data. That is something 
which should never be NULL.

andreas_kupries added on 2002-08-20 09:48:52:
Logged In: YES 
user_id=75003

The resource which is not found anymore is an interpreter.
Possibly the interpreter performing the after script. ...
Just checked, this happens without sockets and transfering
them. In other words, this is a different problem than shown
in this report. Creating a new SF entry: #597575.

I will have to check this on a windows platform.

andreas_kupries added on 2002-08-20 09:33:08:
Logged In: YES 
user_id=75003

Tried to replicate on Linux/x86. Used the smtp port instead
of 35000 to ensure that the socket truly exists. May script
is:
set s [socket localhost smtp] ; # connect to smtp mail
package require Thread

puts [info loaded]
puts [pwd]

set t [::thread::create]

::thread::transfer $t $s
::thread::send $t "puts $s line; flush $s"

I.e. this is not interactive, but executed via

tclsh ./testscript

No problem. Then I added the following lines to the script,
at the end:

after 5000 "::thread::send $t exit"
vwait forever

Now every once in a while the script does abort, the error
message is:

Tcl_Release couldn't find reference for 0x80533f8
Aborted (core dumped)


In other words, a panice somewhere. The stack-trace, see
below indicates the handling of the timer event:

#0  0x4014b7b1 in kill () from /lib/libc.so.6
#1  0x400f5e5e in pthread_kill () from /lib/libpthread.so.0
#2  0x400f6339 in raise () from /lib/libpthread.so.0
#3  0x4014cc11 in abort () from /lib/libc.so.6
#4  0x4009f92e in Tcl_PanicVA (format=0x400d2d80
"Tcl_Release couldn't find reference for 0x%x",
argList=0xbffff3b8)
    at ../../tcl/unix/../generic/tclPanic.c:106
#5  0x4009f967 in Tcl_Panic (arg1=0x400d2d80 "Tcl_Release
couldn't find reference for 0x%x") at
../../tcl/unix/../generic/tclPanic.c:134
#6  0x400a897b in Tcl_Release (clientData=0x80533f8) at
../../tcl/unix/../generic/tclPreserve.c:255
#7  0x400b2ff5 in AfterProc (clientData=0x80a95e0) at
../../tcl/unix/../generic/tclTimer.c:1054
#8  0x400b2473 in TimerHandlerEventProc (evPtr=0x80a9660,
flags=-3) at ../../tcl/unix/../generic/tclTimer.c:543
#9  0x4009ce21 in Tcl_ServiceEvent (flags=-3) at
../../tcl/unix/../generic/tclNotify.c:618
#10 0x4009d2a1 in Tcl_DoOneEvent (flags=-3) at
../../tcl/unix/../generic/tclNotify.c:921
#11 0x4006a5ec in Tcl_VwaitObjCmd (clientData=0x0,
interp=0x80533f8, objc=2, objv=0xbffff5b8) at
../../tcl/unix/../generic/tclEvent.c:990
#12 0x4003b90c in TclEvalObjvInternal (interp=0x80533f8,
objc=2, objv=0xbffff5b8, command=0x8052aea "\nvwait
forever\n", length=15, flags=0)
    at ../../tcl/unix/../generic/tclBasic.c:3033
#13 0x4003c5c0 in Tcl_EvalEx (interp=0x80533f8, 
    script=0x80529f8 "\n\nset s [socket localhost smtp] ; #
connect to smtp mail\npackage require Thread\n\nputs [info
loaded]\nputs [pwd]\n\nset t
[::thread::create]\n\n::thread::transfer $t
$s\n::thread::send $t \"puts $s line; flus"...,
numBytes=257, flags=0)
    at ../../tcl/unix/../generic/tclBasic.c:3631
#14 0x40090044 in Tcl_FSEvalFile (interp=0x80533f8,
pathPtr=0x80589e8) at
../../tcl/unix/../generic/tclIOUtil.c:1371
#15 0x40097e83 in Tcl_Main (argc=1, argv=0xbffffaf8,
appInitProc=0x80486d8 <Tcl_AppInit>) at
../../tcl/unix/../generic/tclMain.c:292
#16 0x080486cc in main (argc=2, argv=0xbffffaf4) at
../../tcl/unix/../unix/tclAppInit.c:90
#17 0x4013b17f in __libc_start_main () from /lib/libc.so.6

dgp added on 2002-08-20 06:11:51:
Logged In: YES 
user_id=80530


Just tried to reproduce this on Linux/Alpha
using Tcl 8.4b2 and Thread 2.4.

Right away I see there's a difficulty.
What service do you have running
on port 35000?

Attachments: