Tcl Source Code

View Ticket
Login
Ticket UUID: 3056775
Title: Socket -server stops accepting connections under load
Type: Bug Version: obsolete: 8.6b1.1
Submitter: twylite Created on: 2010-08-31 15:14:35
Subsystem: 25. Channel System Assigned To: andreas_kupries
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2010-09-28 04:17:30
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2010-09-27 21:17:30
Description:
TL;DR and fix
-------------

Socket -server may stop accepting new connections on Windows.  This appears to be caused by a race condition while modifying acceptEventCount and readyEvents (members of SocketInfo* infoPtr in tclWinSock.c) from two different threads.

Affects Tcl versions 8.4.11 through CVS-HEAD.

Adding a lock around access to these members in TcpAccept() (tclWinSock.c) resolves the issue.


diff -r1.73 tclWinSock.c
1481a1482,1487
>
>     /*
>          * All access to sockets in socketList has to be protected by the lock.
>          * Fix for SF Tcl Bug #xxxxxx
>          */
>       WaitForSingleObject(tsdPtr->socketListLock, INFINITE);
1487a1494
>       SetEvent(tsdPtr->socketListLock);


Full description and diagnosis included because I already typed up 90+% of it before finding the solution (and it should be a useful reference).


Description of problem
----------------------

When subjected to load using the OpenLoad (http://openwebload.sourceforge.net/) tool, a Tcl interpreter on x86/WIN32 may stop accepting connections on the loaded socket.

- Tcl does not lock up or crash; a console is able to respond normally.
- Tcl's networking subsystem does not lock up; other sockets opened previously or subsequently behave normally.
- The symptom of the "lock up" is that the port will accept the connection, but there is no evidence that the accept command (i.e. socket -server $acceptcmd) is ever executed.
- Once locked up the socket may occasionally recover, but in most instances remains "locked up" even if left for 15 minutes or more.
- Can be reproduced with Tcl 8.4.11, 8.5.2, 8.6b1.1, and CVS HEAD 2010/08/26 (8.6b1.2).
- Can be reproduced in builds with OPTS=threads and OPTS=nothreads.
- Can be reproduced in tclsh and wish.


Reproducing the problem
-----------------------

I initially noticed the problem while subjecting Wub to performance tests.  To isolate the cause I ended up created a stripped-down version of Wub, then a simple event-based server that fakes the HTTP protocol (attached).

(This issue was originally reported as a Wub issue - http://code.google.com/p/wub/issues/detail?id=23)

Send a single request with Openload to validate the URL
  openload -h User-Agent "MSIE 5.0" http://localhost/ -t
  
Simulate a single client at max load
  openload -h User-Agent "MSIE 5.0" http://localhost/ 1
  
Simulate 10 clients at max load
  openload -h User-Agent "MSIE 5.0" http://localhost/ 10
  
The "User-Agent" may be omitted with the test script; for testing against Wub it is required or openload will be flagged as a spider.

The test script typically locks up in under 15,000 requests with 10 clients, and under 4,000 requests with 20 clients.  Beaviour with 5 or fewer clients is eratic (sometimes no lockup occurs).

Wub will typically lock up in under 10,000 requests with 5 clients.

Using a single client seldom results in a lock up, but I have seen it with Wub.


Things it is (most likely) not
------------------------------

(1) Windows TcpipParameters

In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters I have:
TcpTimedWaitDelay (DWORD) set to 30 (decimal)
MaxUserPort (DWORD) set to 65000 (decimal)
ReservedPorts (MULTI_SZ) includes only 1433-1434

This gives nearly 64000 ports which will linger for 30 seconds, allowing a sustained 2100 TPS at minimum.

Running openload against shttpd (shttpd.sourceforge.net) confirms a sustained 1689 TPS over 220+ seconds (377555 requests) with 5 clients, or 1718 TPS over 230+ seconds (401904 requests) with 20 clients.

Sockets are clearly being recycled here, and at a rate sufficient to sustain in excess of 1700 TPS.

(2) Windows TCP/IP networking in general

As indicated above, shttpd (a C application) is able to run on the same system without symptoms of a "lock up".

As a further test, LightTPD 1.4.26-1 (downloaded from http://files.dtech.hu/pub/WLMP-Project/LightTPD-Win32/) was able to sustain 1336 TPS with 1 client (140 seconds, 187137 requests), 1461 TPS with 5 clients (150+ seconds, 220045 requests), and 1517 TPS with 20 clients (340+ seconds, 519545 requests).


(3) Compiler, tclsh vs wish, OPTS=threads

Using CVS HEAD 2010/08/26 I built 5 binaries, all of which show the "lock up" in under 10,000 requests when using Openload with 10 clients:

openload -h User-Agent "MSIE 5.0" http://localhost/ 10

Binary configurations:
- tclsh86t.exe, MSVC 6 Professional, OPTS=threads
- tclsh86.exe, MSVC 6 Professional, OPTS=nothreads
- wish86t.exe, MSVC 6 Professional, OPTS=threads, includes Tk
- tclsh86.exe, MSVC 9 Express, OPTS=nothreads
- tclsh86t.exe, MSVC 9 Express, OPTS=threads

Using the last build attempts were made to check for recovery after the "lock up" by trying a single connection every few minutes.  After 15+ minutes the server socket was still not responding.  A "netstat -an" confirmed that all waiting sockets were gone within the first minute after the lockup.


(4) listen() backlog

Incoming connections after the "lock up" are not rejected by the Windows networking layer; the connection is accepted.

This has been confirmed using telnet to connect to the port: the connection is accepted.  Instrumentation in the $acceptcmd (e.g. tracing or incrementing a global variable) does not appear to run.

CROSS-CHECK: In addition an instrumented build was made (see below) with backlog=1024 (as using in LightTPD 1.4.26), which did not solve the problem.


(5) FD_SETSIZE

Parts of Tcl's networking subsystem for WIN32 use select(), which means that the maximum simultaneous connections may be limited by FD_SETSIZE.

Openload simulates multiple clients in parallel, but each client behaves synchronously: a response is expected before the next request is sent.  As a result the Tcl server should never be handling more than (num_clients + STD handles) connections.

Since FD_SETSIZE defaults to 64 in the WIN32 SDK, and we can reproduce the problem at less than 10 clients, it stands to reason that FD_SETSIZE is not the cause.

CROSS-CHECK: In addition an instrumented build was made (see below) with FD_SETSIZE=512, which did not solve the problem.


Instrumented Build
------------------

CVS HEAD 2010/08/26 was modified as follows to create an instrumented build:

- In makefile.vc, append "-DFD_SETSIZE=512" to TCL_DEFINES
- In tclWinSock.c, add a printf() of wsaData.iMaxSockets after WSAStartup()
- In tclWinSock.c, change SOMAXCONN to 1024 in listen()

The updated sources were built with MSVC 9 Express and OPTS=threads.

Running the binary confirms iMaxSockets = 32767, and the "lock up" occurs as usual when using openload to simulate 10 clients.

Further instrumentation (specifically printf() traces around connection accepting logic, intended to pinpoint where the notice of new connections was "getting lost") caused the issue to occur more frequently (sometimes almost predictably) or to almost completely vanish.  This pointed to a heisenbug probably involving a synchronisation problem or race condition between the two threads involved.

This realisation led to the identification of a possible race condition around infoPtr->acceptEventCount and infoPtr->readyEvents.  Tests after adding the lock confirm that this was indeed the problem.


REFERENCE: Configure Windows Tcpip Parameters for high performance servers
--------------------------------------------------------------------------

1. In the Registry navigate to
   HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

2. Create or edit TcpTimedWaitDelay (DWORD), set to 30 (decimal)

3. Create or edit MaxUserPort (DWORD), set to 10000 (deicmal)

4. Ensure that ReservedPorts (MULTI_SZ) reserved at least 1433-1434

5. Reboot

The maximum TPS that your server can sustain will be 
((MaxUserPort - 1024 - (number of reserved ports)) / TcpTimedWaitDelay) when the
processing latency per call is low (and lower as latency increases).

References

- MS KB 328476: Description of TCP/IP settings that you may have to adjust when
  SQL Server connection pooling is disabled
  http://support.microsoft.com/kb/328476
  
- MS TechNet (Cable Guy - Dec 2005): Windows TCP/IP Ephemeral, Reserved, and 
  Blocked Port Behavior
  http://technet.microsoft.com/en-us/library/bb878133.aspx
  
- IBM Express: Web server plug-in tuning tips
  http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/rprf_plugin.html

- MS KB 812873: How to reserve a range of ephemeral ports on a computer that is
  running Windows Server 2003 or Windows 2000 Server
  http://support.microsoft.com/kb/812873/
User Comments: andreas_kupries added on 2010-09-28 04:17:30:

allow_comments - 1

andreas_kupries added on 2010-09-28 04:17:29:
Glad to hear.

twylite added on 2010-09-28 03:35:31:
Load testing completed.  core-8.4-branch, core-8.5-branch and HEAD built from CVS today with MSVC6, and each passes at least 2,000,000 transactions (open/request/response/close) from 5 simulated clients on Windows XP SP3.  Before the fix deadlock usually occured within 10,000 transactions.

Also tested with 1, 10 and 20 clients up to at least 50k transactions, with no problems.

Looks to me like the fix works and the issue can be closed.

twylite added on 2010-09-25 01:11:16:
I will be able to do the load testing on Monday, and provide feedback once it's done.

andreas_kupries added on 2010-09-25 00:58:27:
Eh, forgot to say it explicitly. The fix has been committed to all branches. (Even the EOL 8.4).

andreas_kupries added on 2010-09-25 00:57:29:
Testing is done ... Builds fine and passes the testsuite (*). Twylite, given that you already have the setup for load testing, may I impose on you to verify that the new heads of the 8.4, 8.5, and 8.6 branches are no ok ?

(*) In the sense that it doesn't create any more failures than already were present before my changes. This was what took the time in testing when I saw the test failures with my changes yesterday. Now its verified that these were all present before too, i.e. not introduced by the fix and my changes to the fix.

andreas_kupries added on 2010-09-23 23:15:43:
Ok. Mostly done. I moved the WFSO a bit upward, to cover the

    if (newsocket == INVALID_SOCKET) ...

part as well. Required another 'SetEvent' too in that branch, just before the return. I do not see how this cannot be raced like the code after. So I am being cautious, i.e. believe that it indeed can be raced too, and cover it by the lock.

Not committing yet, have to check that it build first.

twylite added on 2010-09-23 14:47:04:
Index: tclWinSock.c
===================================================================
RCS file: /cvsroot/tcl/tcl/win/tclWinSock.c,v
retrieving revision 1.73
diff -u -r1.73 tclWinSock.c
--- tclWinSock.c30 Aug 2010 09:19:37 -00001.73
+++ tclWinSock.c23 Sep 2010 07:36:16 -0000
@@ -1479,12 +1479,19 @@
      * count must be kept. Decrement the count, and reset the readyEvent bit
      * if the count is no longer > 0.
      */
+     
+    /*
+    * All access to sockets in socketList has to be protected by the lock.  
+    * Fix for SF Tcl Bug #xxxxxx
+   */   
+  WaitForSingleObject(tsdPtr->socketListLock, INFINITE);
 
     infoPtr->acceptEventCount--;
 
     if (infoPtr->acceptEventCount <= 0) {
 infoPtr->readyEvents &= ~(FD_ACCEPT);
     }
+  SetEvent(tsdPtr->socketListLock);
 
     /*
      * Win-NT has a misfeature that sockets are inherited in child processes

andreas_kupries added on 2010-09-23 05:59:19:
Your explanation makes sense, and armed with it I believe I have traced the various connections ... "InitSockets()" is called to set the socket handling in a Thread. This creates the co-thread, whose implementation is "SocketThread()". ST() creates the window to receive messages and then simply runs a get/dispatch loop. This windows' class was specially created by IS() as well (before the thead was spawned), and from there DispatchMessage() knows that it has to call "SocketProc()", which therefore runs in the co-thread. It locks the list, searches for the proper infoPtr, and modifies it, per the message. The event-queue and handling is run by the thread itself, which goes through "SocketEventProc()" and "TcpAccept()", and is missing the lock.

Ok. Your fix is a fix, not a band-aid.

Taking the bug back to myself.

I will wait for your unified diff.

This structure should be documented. I will write that up while waiting for your diff. Back-porting to the 8.5 and 8.4 branches should be easy.

twylite added on 2010-09-23 05:33:34:
> seems that you are putting the locks around the statement
>
> "infoPtr->acceptEventCount--;"

Sorry, doesn't sound right.  I think the lock is meant to be around this code block:

1483          infoPtr->acceptEventCount--;
1484
1485    if (infoPtr->acceptEventCount <= 0) {
1486        infoPtr->readyEvents &= ~(FD_ACCEPT);
1487    }
1488

twylite added on 2010-09-23 05:31:21:
Hi,

> Can you provide us with a unified diff for the patch ? Or context diff.
Will do so tomorrow, I don't have the codebase on this PC.
> Anything which tells me not just the line numbers, but also shows the
> relevant code and things around it. Using the line number with Tcl Head it
> seems that you are putting the locks around the statement
>
>     "infoPtr->acceptEventCount--;"

Sounds about right.

> Except that there are other statements also modifying/querying
> acceptEventCount and possinbly should be locked as well.

Yes.  I seem to recall checking that the other modifications were safe.  But I also recall adding info about this to the bug report, which I clearly didn't do :(  I guess I'll look again.

> Another issue I have is that it is extremely unclear how this structure is
> modified by multiple threads. This may be my own non-knowledge of the
> (windows) notifier, so I hope you can enlighten me.

The trick, if I understand it, is that each thread has a socketList in thread-specific-data (tsdPtr), and also a "co-thread" (tsdPtr->socketThread) to receive WSAAsyncSelect() events into SocketProc().

SocketProc() calls Tcl_ThreadAlert() to queue the events that will be handled by SocketEventProc().

The tsdPtr->socketListLock is used to prevent race conditions between the interp thread and "co-thread" while accessing tsdPtr.

> First, the socket should be owned by a single thread and no other should
> access it.

The socket is accessed by the co-thread.  tsdPtr is used to communicate between the interp thread and the co-thread.  This is a race condition between those threads.

One example of the race condition is when there is a context switch immediately after checking the acceptEventCount in TcpAccept():
  Thread1 (TcpAccept) : infoPtr->acceptEventcount--;
  Thread1 (TcpAccept) : if (infoPtr->acceptEventCount <= 0) {
  ** context switch **
  Thread2 (SocketProc) : infoPtr->readEvents |= event;
  Thread2 (SocketProc) : ...
  ** context switch **
  Thread1 (TcpAccept) :   infoPtr->readEvents &= ~(FD_ACCEPT);

andreas_kupries added on 2010-09-23 04:23:55:
Hi Twylite. Having started to look into this a few questions ...

Can you provide us with a unified diff for the patch ? Or context diff. Anything which tells me not just the line numbers, but also shows the relevant code and things around it. Using the line number with Tcl Head it seems that you are putting the locks around the statement

    "infoPtr->acceptEventCount--;"

however I am not sure. With the plain patch I am left guessing. even if my guess seems sensible.
Except that there are other statements also modifying/querying acceptEventCount and possinbly should be locked as well.

Another issue I have is that it is extremely unclear how this structure is modified by multiple threads. This may be my own non-knowledge of the (windows) notifier, so I hope you can enlighten me.

First, the socket should be owned by a single thread and no other should access it.

Second,  when I traced the call-sequence a bit I saw that TcpAccept is called by SocketEventProc, which is the action function for an Event. Events are in queues, one queue per thread, so when a socket is ready the newly minted event is put into a single queue, of the current thread.

So, if multiple threads truly access the structure, then that may be the bug, making the locks you set a band-aid hiding the actual problem. I hope you understand now why I wish to know more.

I am now temporarily re-assigning this bug to Jeff to get his input on this well.

twylite added on 2010-08-31 22:14:36:

File Added - 385110: fake_http_server.tcl

Attachments: