Tcl Source Code

View Ticket
Login
Ticket UUID: 1329754
Title: sockets lose data on Windows
Type: Bug Version: obsolete: 8.4.11
Submitter: dgp Created on: 2005-10-18 18:01:37
Subsystem: 27. Channel Types Assigned To: andreas_kupries
Priority: 5 Medium Severity:
Status: Open Last Modified: 2006-01-21 03:27:33
Resolution: None Closed By:
    Closed on:
Description:
Attached script is demo of the problem.
Start it in one shell window:

    tclsh sdtest.tcl server

to start a server running.

Start it in a second window:

    tclsh sdtest.tcl client

to start a client of that server.

The server shell window should
print a sequence of messages
it received from the client, starting
with message 10 count down to 1.

This works fine on linux and
solaris.  On windows, the output
does not reach message 1, but
craps out about 4 or 5 messages
short of all the data.  I did Windows
testing with the ActiveTcl 8.4.11.2
tclsh.

A minor change to the client part
of the demo script, so that the
socket is explicitly made blocking
*before* the final call to [flush], and
the bug is worked around and all
data passes through on Windows.

This should not be required.  Sockets
should not lose data on any platform.
User Comments: dgp added on 2006-01-21 03:27:33:
Logged In: YES 
user_id=80530


New reports are indicating that
even forcing a [close] on the
client side is not enough.

If the "nice" level of the two
processes are such that the client
gets more cycles than the server,
then it is reported that we still
see data loss.

Perhaps a second bug, server side
this time?

dgp added on 2006-01-11 04:19:45:
Logged In: YES 
user_id=80530


did some more testing, and
even in the case where the
socket is never made non-blocking,
data can still be lost if the
client side does not perform
an explicit [close].

Revised summary to reflect that
non-blocking is not essential to
demo of the bug.

So why would an explicit [close]
differ from the Tcl_Close()
that ought to be implicit
in finalization?

dgp added on 2006-01-11 04:11:29:
Logged In: YES 
user_id=80530


looking at this again, it appears
that what it required on the client
side to avoid data loss is *both*
an [fconfigure -blocking 1] and
an explicit [close].

If the client side is left non-blocking
data is lost.  If the [close] command
is not explicitly done, then the implicit
close that should happen during [exit]
loses data too.

Note that it's all changes on the client
side of the connection that make a difference.
Configuring the server side doesn't seem to
play a role at all, which suggests to me
the problem is not with the read side of
things.

davygrvy added on 2005-10-19 16:44:05:
Logged In: YES 
user_id=7549

There is an odd situation with the generic layer where if an
amount of read() operations caused by a given [gets] call
consumes EOF to the generic layer it ends up being the
responsibility of the channel driver to continue firing
readable operations on the channel until it is closed.  IMO,
EOF had already been read into the generic layer and given
it's knowledge of EOF, shouldn't the channel driver's job be
done regarding notification? And shouldn't it be the generic
layer's responsibility to fire off readable instead? 
Honestly, this is quite inefficient when the channel driver
will never expect anymore system notifications for that
socket anymore and needs manufacture them just for this
situation.

I'm not sure if this relates, though.

dgp added on 2005-10-19 01:54:23:
Logged In: YES 
user_id=80530


same problem in the oldest ActiveTcl
I found, 8.3.3 from April 2001.

Looks like flushing non-blocking sockets
on Windows has just been broken for
a long, long time.

davygrvy added on 2005-10-19 01:50:01:
Logged In: YES 
user_id=7549

I do not have any development tools to work on this today. 
reassigning to another.

dgp added on 2005-10-19 01:49:15:
Logged In: YES 
user_id=80530


Same problem present in the Oct. 2002
ActiveTcl 8.4.0 release.

dgp added on 2005-10-19 01:43:03:
Logged In: YES 
user_id=80530


speculation appears to be false.

ActiveTcl 8.4.7 has the same problem,
and that's before the 847693 changes
happened.

dgp added on 2005-10-19 01:32:01:
Logged In: YES 
user_id=80530


Speculation this may be releated
to 947693 ?

dgp added on 2005-10-19 01:01:48:

File Added - 153032: sdtest.tcl

Attachments: