Tcl Source Code

View Ticket
Login
Ticket UUID: 10dc6daa375d45051a3c6064cbc1b32686bc1745
Title: Tcl_Gets() fails to read non-blocking socket
Type: Bug Version: 8.5.16 8.5.17rc
Submitter: anonymous Created on: 2014-10-15 19:15:05
Subsystem: 25. Channel System Assigned To: dgp
Priority: 9 Immediate Severity: Severe
Status: Closed Last Modified: 2014-10-17 15:31:14
Resolution: Fixed Closed By: dgp
    Closed on: 2014-10-17 15:31:14
Description:
* Ralf Fassel <[email protected]>
| * Don Porter <[email protected]>
| | On 10/13/2014 08:32 AM, Ralf Fassel wrote:
| | > Using Tcl_Read on a non-blocking socket, we get different return values
| | > for a read with no data available in 8.5.15 vs 8.5.16:
| >
| | ...
| >
| | > Is this change in behaviour and return values documented somewhere?
| >
| | This serious bug came in with Tcl 8.5.16 (and 8.6.2).
| >
| |   http://core.tcl.tk/tcl/info/ed29c4da21
| >
| | Already fixed for 8.5.17 (and 8.6.3).  Pull from dev branches if you
| | need right away, or keep eyes open for the next Release Candidates.
>
| We still have an issue with the code in Tcl+Source+Code-59a0067f32.tar.gz
| (obtained from https://core.tcl.tk/tcl/timeline?r=core-8-5-17-rc by
| clicking on the upmost Leaf [59a0067f32] and downloading the tarball
| from the resulting page).
--<snip-snip>--
| I will try to come up with a little c program...

The following program shows the problem when linked against 8.5.17 and
no problem when linked against 8.5.15.

-----------------------------------------------
/* t.c */
/* input: yes | netcat 65422 */
#include <stdio.h>
#include <tcl.h>
#include <unistd.h>

int main(int argc, char *argv[]) {

  Tcl_FindExecutable(argv[0]);
  Tcl_DString line;
  Tcl_DStringInit(&line);
  int nothing=0;

  Tcl_Channel device = Tcl_OpenTcpClient(0, 65422, "localhost", 0, 0, 0);
  if (0 == device) {
    perror("cant open port");
    return 1;
  }

  if (Tcl_SetChannelOption(0, device, "-blocking", "false") != TCL_OK
      || Tcl_SetChannelOption(0, device, "-buffersize", "1048576") != TCL_OK
      || Tcl_SetChannelOption(0, device, "-translation", "binary") != TCL_OK
      || Tcl_SetChannelOption(0, device, "-buffering", "none") != TCL_OK) {
    fprintf(stderr, "cant set stdin channel options\n");
    return 1;
  }

  while (1) {
    Tcl_DStringSetLength(&line, 0);
    int num = Tcl_Gets(device, &line);
    if (num > 0 || nothing > 1000000) {
      printf("got %d chars with %d nothings\n", num, nothing);
      nothing = 0;
    } else {
      nothing++;
    }
  }
  return 0;
}

/* End of file */

-----------------------------------------------

The socket is provided by
   yes | netcat -l 65422
but probably anything which sends masses of data will probably do.

When linked against 8.5.17, an initial read succeeds, but then nothing
    sh-4.2$ gcc -I PATH-TO-TCL-HEADERS -L . -o t t.c -ltcl8.5.17
    sh-4.2$ env LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t
    got 6 chars with 3314 nothings
    got -1 chars with 1000001 nothings
    got -1 chars with 1000001 nothings
    got -1 chars with 1000001 nothings
    got -1 chars with 1000001 nothings
    got -1 chars with 1000001 nothings
    got -1 chars with 1000001 nothings
    got -1 chars with 1000001 nothings

Linked against 8.5.15:
    sh-4.2$ gcc -I $SIXXBASE/tcltk/linux/include/ -L . -o t t.c -ltcl8.5.15
    sh-4.2$ env LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    got 1 chars with 0 nothings
    ...

Note that TCL_STDIN instead of a real socket did not show the problem.
User Comments: dgp added on 2014-10-17 15:31:14:
....and merged to trunk.

dgp added on 2014-10-17 15:21:06:
More complete fix with tests committed to core-8-5-branch.

anonymous added on 2014-10-17 08:43:43:
--- tcl8.5.17rc/generic/tclIO.c~	2014-10-13 19:34:12.000000000 +0200
+++ tcl8.5.17rc/generic/tclIO.c	2014-10-17 10:39:44.000000000 +0200
@@ -4473,7 +4473,7 @@
 	}
 	if (GotFlag(statePtr, CHANNEL_BLOCKED|CHANNEL_NONBLOCKING)
 		== (CHANNEL_BLOCKED|CHANNEL_NONBLOCKING)) {
-	    goto restore;
+	    /* goto restore; */
 	}
 
 	/*

Confirmed: with this patch against [59a0067f32], the TCL client/server test script below and my C++-level app work ok on a first quick check.

dgp added on 2014-10-16 17:04:33:
While I work on refining the tests and fix, please
confirm that removing (or commenting out) line 4476
of tclIO.c is an effective way to solve the problem you
report.

Thanks.

dgp added on 2014-10-16 11:41:32:
closing in on the fix; stay tuned.

thanks for the great demos.

anonymous added on 2014-10-16 08:28:44:
Here are two plain TCL scripts which show the problem on Linux/Opensuse 13.1 and on Windows 7.  When running via 8.5.16/8.5.17rc the client stops rather quickly.
When running against 8.5.15 or 8.6.1 the clients reads data as expected.

Server:
# ----------------------------
socket -server accept 65422
proc accept {chan addr port} {
    fconfigure $chan -buffering none -translation binary
    while {![catch {
        after 1
        puts $chan "R4C7C W4C7C"
        after 7
        puts $chan "R46D0 W46D0"
        after 6
        puts $chan "R47AC W47AC"
        after 6
        puts $chan "A01BDD10FC07D21C188E2C27D1CC37D1D"
        after 0
        puts $chan "A01BDD2B0C07D22C188E4C27D1DC37D1E"
    }]} {
        # nothing
    }
}
vwait forever
# ----------------------------

Client
# ----------------------------
set fd [socket localhost 65422]
fconfigure $fd -blocking 0 -translation binary -buffering none

fileevent $fd readable [list read_socket $fd]

set last 0
proc read_socket {fd} {
    set now [clock milliseconds]
    gets $fd line
    if {[eof $fd]} {
        close $fd
        return
    }
    if {$line ne ""} {
        puts "$now [expr {$now-$::last}] $line"
        set ::last $now
    }
}
vwait forever
# ----------------------------

anonymous added on 2014-10-15 19:38:09:
arghhh.  the 64khz in the previous remark is nonsense, the interval is something around 50Hz

Here is a TCL script which can serve as input:

puts "Resume"
puts "RB000 WB000"
fconfigure stdout -buffering none -translation binary
while {1} {
    puts "R15D4 W15D4"
    after 1
    puts "R4780 W4780"
    after 1
    puts "R4780 W4780"
    after 1
    puts A01A43E79C07D20C186F7C27D1BC37D1E
    after 1
    puts A01A44019C07D20C186EEC27D1DC37D1E
    after 20
}

tclsh above-code | netcat -l 65422

anonymous added on 2014-10-15 19:30:30:
It seems the timing of the incoming data is important, the yes | netcat does not work reliably as data source to provoke the error.  

In my case the remote host sends data with 64kHz,  some lines with 33 chars and some lines with 11 chars.

With this timing I get:

8.5.17
sh-4.2$ env LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t 192.168.2.1
got 6 chars with 3170 nothings
got -1 chars with 1000001 nothings
got -1 chars with 1000001 nothings
got -1 chars with 1000001 nothings
got -1 chars with 1000001 nothings
got -1 chars with 1000001 nothings
got -1 chars with 1000001 nothings
got -1 chars with 1000001 nothings
...

8.5.15
got 6 chars with 3625 nothings
got 36 chars with 4736 nothings
got 11 chars with 7845 nothings
got 11 chars with 3386 nothings
got 11 chars with 2608 nothings
got 11 chars with 13933 nothings
got 33 chars with 11034 nothings
got 33 chars with 0 nothings
got 11 chars with 2421 nothings
got 11 chars with 15560 nothings
got 11 chars with 15089 nothings
got 33 chars with 12023 nothings
got 33 chars with 0 nothings
got 11 chars with 2774 nothings
got 11 chars with 16272 nothings
got 11 chars with 15980 nothings
got 33 chars with 13336 nothings
got 33 chars with 0 nothings
got 11 chars with 3022 nothings
...