Tcl Source Code

View Ticket
Login
Ticket UUID: 2818131
Title: Crash in zlib when bumped into TCP stack
Type: Bug Version: obsolete: 8.6b1.1
Submitter: flatworm Created on: 2009-07-07 17:38:42
Subsystem: 57. zlib Assigned To: patthoyts
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2010-02-26 07:40:55
Resolution: Fixed Closed By: patthoyts
    Closed on: 2010-02-26 00:40:55
Description:
Tcl 8.6b1.1 (HEAD as of today) crashes somewhere on TCP channel closure, if zlib in inflate mode is bumped into the stack.
The script output and the error with the stack trace I have on Windows XP is shown below.

C:\tmp>c:\opt\86d\bin\tclsh86g.exe crash.tcl
Server listens on sock1876
Client is sock1796
Accepted sock1784
Client closed sock1796
Data on sock1784
EOF on sock1784

(c8c.760): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=feeefeee ebx=7ffde000 ecx=feeefeee edx=009d8f68 esi=0026f998 edi=0026fb04
eip=100c77e4 esp=0026f958 ebp=0026f964 iopl=0         nv up ei ng nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010286
*** WARNING: Unable to verify checksum for C:\opt\86d\bin\tcl86g.dll
tcl86g!DeleteScriptRecord+0x51:
100c77e4 8b5108          mov     edx,dword ptr [ecx+8] ds:0023:feeefef6=????????

0:000> kP
ChildEBP RetAddr  
0026f964 100c7766 tcl86g!DeleteScriptRecord(
struct Tcl_Interp * interp = 0x009aa7b8, 
struct Channel * chanPtr = 0x009eb948, 
int mask = 2)+0x51 [C:\src\tcl\win\..\generic\tclIO.c @ 8556]
0026f988 100c72ae tcl86g!TclChannelEventScriptInvoker(
void * clientData = 0x009e7300, 
int mask = 2)+0x7e [C:\src\tcl\win\..\generic\tclIO.c @ 8679]
0026f9c0 1017ab0e tcl86g!Tcl_NotifyChannel(
struct Tcl_Channel_ * channel = 0x009df7f0, 
int mask = 6)+0x14d [C:\src\tcl\win\..\generic\tclIO.c @ 8170]
0026fb04 100f22f8 tcl86g!SocketEventProc(
struct Tcl_Event * evPtr = 0x009d83a8, 
int flags = -3)+0x29e [C:\src\tcl\win\..\win\tclWinSock.c @ 723]
0026fb2c 100f26e6 tcl86g!Tcl_ServiceEvent(
int flags = -3)+0xa1 [C:\src\tcl\win\..\generic\tclNotify.c @ 670]
0026fb50 1008c2c0 tcl86g!Tcl_DoOneEvent(
int flags = -3)+0x1d0 [C:\src\tcl\win\..\generic\tclNotify.c @ 971]
0026fb68 1001b3f5 tcl86g!Tcl_VwaitObjCmd(
void * clientData = 0x00000000, 
struct Tcl_Interp * interp = 0x009aa7b8, 
int objc = 2, 
struct Tcl_Obj ** objv = 0x009ab1e0)+0x9e [C:\src\tcl\win\..\generic\tclEvent.c @ 1383]
0026fb94 1001b1e2 tcl86g!NRRunObjProc(
void ** data = 0x009d078c, 
struct Tcl_Interp * interp = 0x009aa7b8, 
int result = 0)+0x56 [C:\src\tcl\win\..\generic\tclBasic.c @ 4313]
0026fbb8 1001aa63 tcl86g!TclNRRunCallbacks(
struct Tcl_Interp * interp = 0x009aa7b8, 
int result = 0, 
struct TEOV_callback * rootPtr = 0x00000000, 
int tebcCall = 0)+0xc6 [C:\src\tcl\win\..\generic\tclBasic.c @ 4260]
0026fbd8 1001cbb4 tcl86g!Tcl_EvalObjv(
struct Tcl_Interp * interp = 0x009aa7b8, 
int objc = 2, 
struct Tcl_Obj ** objv = 0x009ab1e0, 
int flags = 2097152)+0x53 [C:\src\tcl\win\..\generic\tclBasic.c @ 4031]
0026fc9c 1001c2c8 tcl86g!TclEvalEx(
struct Tcl_Interp * interp = 0x009aa7b8, 
char * script = 0x009dc350 "proc Accept {sock addr port} {..chan configure $sock -translation binary -buffering none..zlib push inflate $sock..chan event $sock readable [list Read $sock]..puts "Accepted $sock".}..proc Read sock {..puts "Data on $sock"..if {[gets $sock line] < 0} {...puts "EOF on $sock"...chan close $sock..} else {...puts "Rcvd: [regsub -all {[^[:print:]]} $line .]"..}.}..proc MakeServerCrash sock {..puts $sock test..chan close $sock..puts "Client closed $sock".}..set serv [socket -server Accept -myaddr localhost 0].set port [lindex [chan configure $serv -sockname] 2].puts "Server listens on $serv"..set client [socket localhost $port].chan configure $client -translation binary -buffering none.zlib push deflate $client.chan event $client readable [list Read $client].puts "Client is $client"..after idle [list MakeServerCrash $client]..vwait forever..", 
int numBytes = 848, 
int flags = 0, 
int line = 36)+0x8e0 [C:\src\tcl\win\..\generic\tclBasic.c @ 5153]
0026fcb8 100d3b69 tcl86g!Tcl_EvalEx(
struct Tcl_Interp * interp = 0x009aa7b8, 
char * script = 0x009dc350 "proc Accept {sock addr port} {..chan configure $sock -translation binary -buffering none..zlib push inflate $sock..chan event $sock readable [list Read $sock]..puts "Accepted $sock".}..proc Read sock {..puts "Data on $sock"..if {[gets $sock line] < 0} {...puts "EOF on $sock"...chan close $sock..} else {...puts "Rcvd: [regsub -all {[^[:print:]]} $line .]"..}.}..proc MakeServerCrash sock {..puts $sock test..chan close $sock..puts "Client closed $sock".}..set serv [socket -server Accept -myaddr localhost 0].set port [lindex [chan configure $serv -sockname] 2].puts "Server listens on $serv"..set client [socket localhost $port].chan configure $client -translation binary -buffering none.zlib push deflate $client.chan event $client readable [list Read $client].puts "Client is $client"..after idle [list MakeServerCrash $client]..vwait forever..", 
int numBytes = 848, 
int flags = 0)+0x1a [C:\src\tcl\win\..\generic\tclBasic.c @ 4854]
0026fd38 100e44cf tcl86g!Tcl_FSEvalFileEx(
struct Tcl_Interp * interp = 0x009aa7b8, 
struct Tcl_Obj * pathPtr = 0x009d0b00, 
char * encodingName = 0x00000000 "")+0x260 [C:\src\tcl\win\..\generic\tclIOUtil.c @ 1753]
*** WARNING: Unable to verify checksum for tclsh86g.exe
0026ff54 0040108a tcl86g!Tcl_Main(
int argc = -1, 
char ** argv = 0x009a2510, 
<function> * appInitProc = 0x00401005)+0x3b1 [C:\src\tcl\win\..\generic\tclMain.c @ 353]
0026ff70 0040124f tclsh86g!main(
int argc = 2, 
char ** argv = 0x009a2508)+0x6a [C:\src\tcl\win\..\win\tclAppInit.c @ 102]
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\WINDOWS\system32\kernel32.dll - 
0026ffc0 7c817077 tclsh86g!mainCRTStartup(void)+0xff [crtexe.c @ 338]
WARNING: Stack unwind information not available. Following frames may be wrong.
0026fff0 00000000 kernel32!RegisterWaitForInputIdle+0x49

What's odd, is that the behaviour depends on whether zlib is pushed onto the stack on the client side (line #30 i the script commented out): if it does not, Tcl does not crash, but someting looking like an incomplete error info (?) is written to the terminal, like this:

C:\tmp>c:\opt\86d\bin\tclsh86g.exe crash.tcl
Server listens on sock1876
Client is sock1796
Accepted sock1784
Client closed sock1796
Data on sock1784
EOF on sock1784

    while executing
"chan close $sock"
    (procedure "Read" line 5)
    invoked from within
"Read sock1784"
User Comments: patthoyts added on 2010-02-26 07:40:55:

allow_comments - 1

I added a couple of additional tests to HEAD that have been sat in my repository for a while which were crashing as described here. Andreas just made a commit that has solved this so they no longer crash which closes this issue.

patthoyts added on 2009-07-11 04:26:07:
The crucial requirement for this crash is that we use gets and not read and that an error occurs during the underlying read from the channel. Provided we additionally fail to unhook the fileevents we get a crash caused by a fileevent from the windows socket code trying to process an event on the channel structure that has already been deleted.
We can generate the channel error using a mismatched compress/decompress pair so that the read/gets call will error. This matched with a channel event handler like:
proc Read {type chan} {
    #set data [read $chan]
    if {[gets $chan line] < 0} {
        puts "error?"
        #chan event $chan {}
        close $chan
        return
    }
    puts "Read $type [string length $data]"
    if {[eof $chan]} {
        puts "Eof $type $chan"
        close $sock
    }
}
will crash. If we use read it isssues an error message. If we unset the event handler it will be ok too.

patthoyts added on 2009-07-11 02:20:14:
To correct myself - the original is fixed. But if you mismatch the compression types (use gzip with inflate) it will crash.

patthoyts added on 2009-07-11 02:18:18:
I converted this script into test zlib-10.1 which reproduces the crash. The problem is that when a channel is closed by the Tcl_FinalizeIOSubsystem call the interp is NULL. So a channel that gets left open (due to an error that causes the close call to be skipped) gets closed when the program exits - with the error left in place, an attempt is made to set the interp error and the crash occurs
There is evidently something else still remaining as the original script continues to crash.

flatworm added on 2009-07-10 23:16:22:
The script now works as intended on Windows XP and Linux (threaded and non-threaded builds):

Server listens on sock3
Client is sock4
Accepted sock5
Client closed sock4
Data on sock5
Rcvd: test
Data on sock5
EOF on sock5

dkf added on 2009-07-10 15:53:18:
Please recheck whether it is fixed with your script.

Generally, of more concern is that it was crashing; I'd have expected wrong results from that particular bug, not a crash. That indicates that either *our* code doing the integration between zlib and Tcl is wrong (quite possible) or that there's a problem in zlib itself with malformed data handling. The latter would be Very Bad.

andreas_kupries added on 2009-07-08 23:57:12:
Thanks. Upping priority.

flatworm added on 2009-07-08 23:53:20:
After doing

cvs update -D 2009-07-04

and rebuilding, the problem goes away.

The [Read] proc is never called on the server socket (i.e. it doesn't detect EOF), and if I add
chan configure $sock -flush sync
after
puts $sock test
to the client code, that flushing command seems to block indefinitely, but this seems to be another problem, at least the crash goes away.

andreas_kupries added on 2009-07-08 23:24:09:
Hi Konstantin.
Can you check if the crash does not happen for the CVS content before July 5 ?
I.e. before the checkin

2009-07-05  Donal K. Fellows  <[email protected]>

* generic/tclZlib.c (ZlibTransformWatch): Correct the handling of
events so that channel transforms work with things like an asynch
[chan copy]. Problem reported by Pat Thoyts.

flatworm added on 2009-07-08 23:11:03:
Just tested threaded and non-threaded builds on Linux with the same script -- the backtrace is identical with the only difference being the fourth stack frame which contains a call to Unix-specific FileHandlerEventProc() instead of Windows-specific SocketEventProc().

The behaviour with zlib disabled on the client side (line #30 commented out in the test case script) is also identical modulo different names for the socket handles with a non-threaded build; with a threaded build it produces segmentation fault no matter whether the zlib is pushed to the stack on the client side.

So the problem is possibly not bound to WIndows.

flatworm added on 2009-07-08 22:59:06:
For some reason the original test case script ended up being empty, so I reuploaded it.

flatworm added on 2009-07-08 22:58:13:

File Added - 334135: crash.tcl

flatworm added on 2009-07-08 22:57:35:

File Deleted - 333983:

flatworm added on 2009-07-08 00:38:42:

File Added - 333983: crash.tcl

Attachments: