Tcl Source Code

View Ticket
Login
Ticket UUID: 3401422
Title: socket -async: channel vs. socket blocking/non-blocking mode
Type: Bug Version: obsolete: 8.6b2
Submitter: foxcruiser Created on: 2011-08-31 09:42:32
Subsystem: 27. Channel Types Assigned To: rmax
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2011-09-02 04:24:10
Resolution: Fixed Closed By: ferrieux
    Closed on: 2011-09-01 21:03:51
Description:
uname -v 
-> Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386

yesterday I had a legacy script, which sets up and [read]s from an async client-side socket in channel non-blocking mode, evaluated by 8.6b2 (fossil tag core-8-6-b2); the script has been running without issues starting from 8.3+ through 8.5.10, to my knowledge. I found a slight behavioural shift which left me puzzled for some time. the script did the following at the client side:

set client [::socket -async localhost 9000]
::fconfigure $client -encoding binary -translation binary -blocking 0; # the scripts wants the channel and the sock to be in non-blocking mode!
::fileevent $client writable [list write]

In the [write] handler, the script [puts] sth to its peer, then waits for a [fileevent readable] to receive the response by [read]

read $client

this [read], however, blocked ... seemingly obvious, because it was unconditional (numChars ...), to begin with, but the script expected it to be non-blocking (channel & sock). A [fconfigure $client -blocking] right before the [read] said 0 (non-blocking channel). still, it was blocking. when doing some strace'ing, I found that the socket to [read] from (recv()) did not have the O_NONBLOCK flag set, after all, though requested above?!

i then had a look at the code base (tclUnixSock.c) and recognized that a lot has changed under the hood since 8.5(.10); when skimming over the code, I found changes for handling the async client socks, using a C-level fileevent handler: TcpAsyncCallback().

When re-entering CreateClientSocket() by processing the callback, the unix sock is pushed into blocking mode explicitly:         TclUnixSetBlockingMode(state->fds.fd, TCL_MODE_BLOCKING); This explained at least, why [read] above was seeing a blocking sock, while the channel was reported as non-blocking by [fconfigure].

Why is this TclUnixSetBlockingMode(state->fds.fd, TCL_MODE_BLOCKING) needed? I guess, it made it into the code because of Bug 4388:
http://sourceforge.net/tracker/index.php?func=detail&aid=219061&group_id=10894&atid=110894

... the Bug-4388-patch intended to keep channel and sock "in sync", right (when introspected through [fconfigure]?
but now things have changed, by introducing the TcpAsyncCallback():

1) As for Bug 4388, after a [socket -async], [fconfigure $client -blocking] returns "1" while O_NONBLOCK  is set ... is this intended?
2) As for my script: reqesting the non-blocking mode right after an [socket -async] (as shown above) will be overruled by TcpAsyncCallback(); the only "safe" place to do so is in the first script-level writable handler processed after TcpAsyncCallback(), correct? If so, this should be documented prominently: Don't expected [fconfigure] to report or set a channels state before the first writable event has been signalled!

So, how should [fconfigure] and sock configurations be kept "in sync" (Bug 4388 for blocking mode request and my case having request non-blocking before TclAsyncCallback() was fired). What are the intended semantics for requesting non-blocking mode for a channel/sock pair when using [socket -async]?
User Comments: ferrieux added on 2011-09-02 04:24:10:

allow_comments - 0

ferrieux added on 2011-09-02 04:03:51:

allow_comments - 1

Committed after concertation. Not including a test, because it is not possible to depend on an external server with proper timings. A C-level instrumentation is need (in tcltest).

ferrieux added on 2011-09-02 01:40:49:
Reinhard, please review ;-)

ferrieux added on 2011-09-02 01:40:21:

File Added - 422580: async-nonblo.patch

ferrieux added on 2011-09-02 01:39:55:
Attached patch fixes the issue by cacheing the nonblocking flag coming from the script's actions while the async connection is in progress (including trying several interfaces or addresses), and committing its final value on async completion.

ferrieux added on 2011-08-31 17:12:21:
You are right on all aspects:
   - it IS a regression
   - you found the workaround (ie fconfigure only after the writable has fired)

We'll be fixing that shortly, good catch !

Attachments: