Tcl Source Code

View Ticket
Login
Ticket UUID: 72814c7ef162e03e4261a0236f4f03efc8c51372
Title: Tcl 8.6.7 - Tcl 8.6.13 socket.test hang intermittently on Windows 10
Type: Bug Version: 8.6.7
Submitter: szhong Created on: 2023-10-16 15:05:19
Subsystem: 25. Channel System Assigned To: sebres
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2023-10-23 16:42:11
Resolution: None Closed By: nobody
    Closed on:
Description:
I had originally posted in the newsgroup and Harald Oehlmann had kindly directed me to post here.

I'm running into an issue where socket.test is hanging on my Windows 10 machine using Visual Studio 2013 with Tcl 8.6.7 and Tcl 8.6.13. It looks like it hangs at the end of the test when close() is being called on the socket.

nmake -f makefile.vc MACHINE=AMD64 release
nmake -f makefile.vc MACHINE=AMD64 TESTFLAGS="-verbose tpse -file socket.test" test
nmake -f makefile.vc MACHINE=AMD64 TESTFLAGS="-verbose tpse -file socket.test" test
nmake -f makefile.vc MACHINE=AMD64 TESTFLAGS="-verbose tpse -file socket.test" test
nmake -f makefile.vc MACHINE=AMD64 TESTFLAGS="-verbose tpse -file socket.test" test
...
++++ socket-15.1.0 PASSED
---- socket-15.1.1 start
++++ socket-15.1.1 PASSED
---- socket-15.1.2 start
++++ socket-15.1.2 PASSED
---- socket-15.1.3 start
++++ socket-15.1.3 PASSED
---- socket-15.1.4 start
++++ socket-15.1.4 PASSED
---- socket-15.1.5 start
++++ socket-15.1.5 PASSED
---- socket-15.1.6 start
++++ socket-15.1.6 PASSED
---- socket-15.1.7 start
++++ socket-15.1.7 PASSED
---- socket-15.1.8 start
++++ socket-15.1.8 PASSED
(hangs)

Using git bisect, I have found the first bad commit to be

5d0864fb41175752fd026bc818facf5849387c5a is the first bad commit
commit 5d0864fb41175752fd026bc818facf5849387c5a
Merge: 237c46512c 2f5a766c71
Author: jan.nijtmans <[email protected]>
Date: Mon May 1 08:20:54 2017 +0000

Fix [8bd13f07bde6fb0631f27927e36461fdefe8ca95|8bd13f07bd]: Closing tcl pipes prevents windows threads from starting up. Patch by sebres (Dipl. Ing. Sergey G. Brester)
(also fix duplicate test-case number in link.test)

tests/link.test | 2 +-
win/tclWinConsole.c | 223 +++++++------------
win/tclWinInit.c | 20 ++
win/tclWinInt.h | 75 +++++++
win/tclWinPipe.c | 621 ++++++++++++++++++++++++++++++++++++++--------------
win/tclWinSerial.c | 93 ++------
6 files changed, 643 insertions(+), 391 deletions(-)

I have also used Visual Studio 2019 to compile Tcl 6.8.13 and reproduce the hang, but on a different test within socket.test.

++++ socket_inet6-13.1 SKIPPED: thread
++++ socket_inet6-13.2.tr1 SKIPPED: thread
++++ socket_inet6-13.2.tr2 SKIPPED: thread
++++ socket_inet6-13.2.cl1 SKIPPED: thread
++++ socket_inet6-13.2.cl2 SKIPPED: thread
(hangs)

It hangs in the sendCommand()

socket.test: 1932
# cleanup
if {$remoteProcChan ne ""} {
catch {sendCommand exit}
}

I have verified that it hangs with 32-bit as well using Visual Studio 2019

nmake -f makefile.vc MACHINE=IX86 TESTFLAGS="-verbose tpse -file socket.test" test

...
---- socket-15.1.0 start
++++ socket-15.1.0 PASSED
---- socket-15.1.1 start
++++ socket-15.1.1 PASSED
---- socket-15.1.2 start
++++ socket-15.1.2 PASSED
---- socket-15.1.3 start
++++ socket-15.1.3 PASSED
---- socket-15.1.4 start
++++ socket-15.1.4 PASSED
---- socket-15.1.5 start
++++ socket-15.1.5 PASSED
---- socket-15.1.6 start
++++ socket-15.1.6 PASSED
---- socket-15.1.7 start
++++ socket-15.1.7 PASSED
---- socket-15.1.8 start
++++ socket-15.1.8 PASSED
(hangs)
User Comments: apnadkarni added on 2023-10-23 16:42:11:
I didn't close the ticket and it does not show up as closed for me. Your post is awaiting moderator approval (not sure why that is the case either).

In any case, thanks for confirmation that hitting returns allows tcltest to continue. It supports what I said earlier about the console reader thread.

Will wait for a response from Sergey, if any, else will try to cobble a fix.

apnadkarni added on 2023-10-22 15:26:24:

Reproducing by running the test suite in a loop, here is what I see. The main thread is blocked in TclPipeThreadStop waiting for the console reader thread to exit. The stack:

ntdll.dll!00007ff99256e234()
KernelBase.dll!00007ff98ff957c2()
tcl86t.dll!TclPipeThreadStop(TclPipeThreadInfo * * pipeTIPtr, void * hThread) Line 3600
	at d:\src\tcltk\core-8-6-branch\tcl\win\tclWinPipe.c(3600)
tcl86t.dll!ConsoleCloseProc(void * instanceData, Tcl_Interp * interp) Line 547
	at d:\src\tcltk\core-8-6-branch\tcl\win\tclWinConsole.c(547)
tcl86t.dll!ChanClose(Channel * chanPtr, Tcl_Interp * interp) Line 362
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclIO.c(362)
tcl86t.dll!TclFinalizeIOSubsystem() Line 677
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclIO.c(677)
tcl86t.dll!FinalizeThread(int quick) Line 1320
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclEvent.c(1320)
tcl86t.dll!Tcl_Exit(int status) Line 995
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclEvent.c(995)
tcl86t.dll!Tcl_ExitObjCmd(void * dummy, Tcl_Interp * interp, int objc, Tcl_Obj * const * objv) Line 1036
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclCmdAH.c(1036)
tcl86t.dll!Dispatch(void * * data, Tcl_Interp * interp, int result) Line 4503
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclBasic.c(4503)
tcl86t.dll!TclNRRunCallbacks(Tcl_Interp * interp, int result, NRE_callback * rootPtr) Line 4538
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclBasic.c(4538)
tcl86t.dll!TclEvalObjEx(Tcl_Interp * interp, Tcl_Obj * objPtr, int flags, const CmdFrame * invoker, int word) Line 6104
	at d:\src\tcltk\core-8-6-branch\tcl\generic\tclBasic.c(6104)

The ConsoleReader thread is blocked on ReadConsole with the stack

ntdll.dll!00007ff99256d0c4()
KernelBase.dll!00007ff98fe87861()
KernelBase.dll!00007ff98ff90e4b()
KernelBase.dll!00007ff98ff90eda()
tcl86t.dll!ReadConsoleBytes(void * hConsole, void * lpBuffer, unsigned long nbytes, unsigned long * nbytesread) Line 225
	at d:\src\tcltk\core-8-6-branch\tcl\win\tclWinConsole.c(225)
tcl86t.dll!ConsoleReaderThread(void * arg) Line 1150
	at d:\src\tcltk\core-8-6-branch\tcl\win\tclWinConsole.c(1150)
kernel32.dll!00007ff9911b7344()
ntdll.dll!00007ff9925226b1()

The above are the only two threads running.

In my case at least, hitting Return a few times unblocks the ConsoleReader thread allowing the process to exit.

First, I do not think this is an issue with sockets at all, the socket tests just trigger the bug related to finalization of the console threads.

Second, I do not see how the synchronization during finalization is supposed to work. What unblocks the ReadConsole call shown on the stack above? On older systems (like XP, 2003, which 8.6 supports) , the Win32 CancelSyncIo api does not exist. Second, even if it did, there is a race condition, however small, between the CancelSyncIO in the main thread and the ReadConsole in the console reader thread. If the latter happens after the former but before the close handle (in the main thread) it will still block.

Third, even if it unblocks, afaict it will see a ERROR_OPERATION_ABORTED and just loop around and block again.

Sergey is of course far more competent is these matters and hopefully he can find a solution.

Note the windows console in 8.7 is completely rewritten so in theory should not be affected by this. It would be useful if @szhong could test 8.7 on his system to verify it is not subject to this "hang".


apnadkarni added on 2023-10-22 04:41:56:
You were right about sendCommand, I misread the code. It does start a local server even if a remote server is not configured.

I'm still interested in knowing if the hang ends after hitting Returns a few times.

apnadkarni added on 2023-10-22 02:21:04:
Can you please confirm the following if you can continue to reproduce the hangs.

Are you running with a remote server configured? Else that line 1932 (sendCommand) should not be invoked so the hang is likely elsewhere.

Also, when you think it is hanging, please try hitting the Returns key a few (5-6) times and wait a few seconds. Does it continue?

Attachments: