Tcl Source Code: View Ticket

Ticket UUID:	b6d0d8cc2c82dbb366d1fb3f32afd2a027a41135
Title:	(windows) socket close: graceful shutdown vs. lingering (eternal TIME_WAIT issue).
Type:	Bug	Version:
Submitter:	sebres	Created on:	2019-09-13 22:15:54
Subsystem:	25. Channel System	Assigned To:	nobody
Priority:	5 Medium	Severity:	Important
Status:	Pending	Last Modified:	2020-04-27 10:51:35
Resolution:	Fixed	Closed By:	nobody
		Closed on:
Description:	Initially reported by @daviddem on Stack Overflow in "tcl close does not gracefully terminate tcp/ip connection" (and in tclchat). Current implementation of socket subsystem for Windows leaves the socket after `close` linger (remain open after a `closesocket` call), so lets flooding with sockets in `TIME_WAIT` state up to free descriptor or port range until TcpTimedWaitDelay is reached for that. This may be especially worse because the the maximum number of file descriptors allowed per process is defined by macro FD_SETSIZE (default 1024 on Windows) and it can be reached very fast by many concurrent connection going "half" closed related to the issue (either on client or on server resp. both sides). The simplest test-case looks like this: % puts [exec netstat -n \| grep -c TIME_WAIT] 2 % timerate { time { close [socket localhost 80] } 50; after 100; puts [exec netstat -n \| grep -c TIME_WAIT] } 1000 52 102 152 As one can see, every socket entering TIME_WAIT, so linger in a bit strange manner - remain open after a `closesocket` call (corresponding MS, to enable queued data to be sent)... Just there is nothing to send (excepting notifying packet "socket going closed"). MS docs ("Graceful Shutdown, Linger Options, and Socket Closure" and "LINGER (winsock.h)") are realy confusing, but... I found how one can "solve" the entering TIME_WAIT state... Added this implicit before `closesocket`: /* set SO_DONTLINGER to 0 (yes, 0 :) - forces the socket will not remain open / BOOL val = 0; setsockopt(infoPtr->socket, SOL_SOCKET, SO_DONTLINGER, (const char )&val, sizeof(BOOL)); and it looks like the flooding with such half-closed sockets is stopped... % puts [exec netstat -n \| grep -c TIME_WAIT] 1 % timerate { time { close [socket localhost 80] } 50; after 100; puts [exec netstat -n \| grep -c TIME_WAIT] } 1000 1 1 1 Just I don't think it can be called as "graceful" shutdown now, because I assume it does not send any pending data anymore, (but I don't know how one can do it properly without providing more options to control that by developer)... There is also another weird MS sentence in the docu: Note that enabling a nonzero timeout on a nonblocking socket is not recommended. And are our (win-native) sockets not always nonblocking? I think so... at least in the `CreateSocket` `FIONBIO` is set to 1. So it is more to investigate. But as interim solution we can implement "-linger" option (so setting it to 0 would avoid that, and any other integer value would specify setting a linger timeout, where -1 can mean default system timeout is used). One could also well safe provide this option firstly using some define TCL_FEATURE_SOCKET_LINGER (similar TCL_FEATURE_KEEPALIVE_NAGLE).
User Comments:	sebres added on 2020-04-27 10:51:35: I think, I fixed it in [b960d1b71e] as good as possible: on close it would firstly try a graceful disconnect and don't linger if in success case (as well as would perform a hard reset in case if socket get closed without data sent/received). Test script is attached... I don't see growing sockets in TIME-WAIT state anymore (regardless how it is closed on server side). Tests (socket.test and -io.test) passed too. sebres added on 2020-04-24 18:40:16: Further investigations shows that the kind of close doing from other side (peer) is also very important to fulfill graceful shutdown - if it is not following the strict rules of graceful disconnect process, many sockets would enter TIME-WAIT state (regardless proper lingering set for them). For instance here is the diff illustrating bad (red) and good (green) server accept procedure (in sense of graceful disconnect), if a client peer closing it properly at its side (with WSASendDisconnect, wait for FD_CLOSE, etc): `socket -server {apply {{ch args} { - close $ch + chan event $ch readable [list apply {{ch} { + if {[catch {string length [read $ch 1024]} s] \|\| $s == 0} { + close $ch + } + }} $ch] }}} $port` To signal the client that it is disconnected immediately (red case), one possibility would be to use hard reset (linger with l_onoff = 1, and l_linger = 0), but this is possible only if no data was ever sends, so exactly for this case (otherwise this can cause a loss of sent but not yet transmitted/received data). Additionally the TcpThreadActionProc makes the fix more complex, because the close of channel causes that CutChannel is invoked (without possibility to distinguish close and detach) before TcpCloseProc will be executed, so it removes the socket from co-thread ("windowed" handler, that handles this socket normally). sebres added on 2019-09-16 10:56:24: > Apparently, this is an internal WIndows "feature", that Windows wants to give itself time to empty its buffers.* Agree and I thought also it's "normal" Windows behavior, unless I saw every single socket in the time-wait state without necessary sending buffers up-to default timeout (TcpTimedWaitDelay). This cannot be called "normal" anymore and either graceful shutdown should be implemented differently (at least up-to last receive after some send/flush) or there is really an option needed to control that by developer. oehhar added on 2019-09-15 16:47:25: Serges, this sounds sensible, thank you. David Graveraux might be a wizard to give more light to it. Apparently, this is an internal WIndows "feature", that Windows wants to give itself time to empty its buffers. A new parameter might be sensible. Also, the -nagle" parameter was always there but commented out. Thank you, Harald

Attachments:

test-sock-time-wait - b6d0d8cc2c.tcl [download] added by sebres on 2020-04-27 10:38:03. [details]