Ticket UUID: | 10dc6daa375d45051a3c6064cbc1b32686bc1745 | |||
Title: | Tcl_Gets() fails to read non-blocking socket | |||
Type: | Bug | Version: | 8.5.16 8.5.17rc | |
Submitter: | anonymous | Created on: | 2014-10-15 19:15:05 | |
Subsystem: | 25. Channel System | Assigned To: | dgp | |
Priority: | 9 Immediate | Severity: | Severe | |
Status: | Closed | Last Modified: | 2014-10-17 15:31:14 | |
Resolution: | Fixed | Closed By: | dgp | |
Closed on: | 2014-10-17 15:31:14 | |||
Description: |
* Ralf Fassel <[email protected]> | * Don Porter <[email protected]> | | On 10/13/2014 08:32 AM, Ralf Fassel wrote: | | > Using Tcl_Read on a non-blocking socket, we get different return values | | > for a read with no data available in 8.5.15 vs 8.5.16: | > | | ... | > | | > Is this change in behaviour and return values documented somewhere? | > | | This serious bug came in with Tcl 8.5.16 (and 8.6.2). | > | | http://core.tcl.tk/tcl/info/ed29c4da21 | > | | Already fixed for 8.5.17 (and 8.6.3). Pull from dev branches if you | | need right away, or keep eyes open for the next Release Candidates. > | We still have an issue with the code in Tcl+Source+Code-59a0067f32.tar.gz | (obtained from https://core.tcl.tk/tcl/timeline?r=core-8-5-17-rc by | clicking on the upmost Leaf [59a0067f32] and downloading the tarball | from the resulting page). --<snip-snip>-- | I will try to come up with a little c program... The following program shows the problem when linked against 8.5.17 and no problem when linked against 8.5.15. ----------------------------------------------- /* t.c */ /* input: yes | netcat 65422 */ #include <stdio.h> #include <tcl.h> #include <unistd.h> int main(int argc, char *argv[]) { Tcl_FindExecutable(argv[0]); Tcl_DString line; Tcl_DStringInit(&line); int nothing=0; Tcl_Channel device = Tcl_OpenTcpClient(0, 65422, "localhost", 0, 0, 0); if (0 == device) { perror("cant open port"); return 1; } if (Tcl_SetChannelOption(0, device, "-blocking", "false") != TCL_OK || Tcl_SetChannelOption(0, device, "-buffersize", "1048576") != TCL_OK || Tcl_SetChannelOption(0, device, "-translation", "binary") != TCL_OK || Tcl_SetChannelOption(0, device, "-buffering", "none") != TCL_OK) { fprintf(stderr, "cant set stdin channel options\n"); return 1; } while (1) { Tcl_DStringSetLength(&line, 0); int num = Tcl_Gets(device, &line); if (num > 0 || nothing > 1000000) { printf("got %d chars with %d nothings\n", num, nothing); nothing = 0; } else { nothing++; } } return 0; } /* End of file */ ----------------------------------------------- The socket is provided by yes | netcat -l 65422 but probably anything which sends masses of data will probably do. When linked against 8.5.17, an initial read succeeds, but then nothing sh-4.2$ gcc -I PATH-TO-TCL-HEADERS -L . -o t t.c -ltcl8.5.17 sh-4.2$ env LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t got 6 chars with 3314 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings Linked against 8.5.15: sh-4.2$ gcc -I $SIXXBASE/tcltk/linux/include/ -L . -o t t.c -ltcl8.5.15 sh-4.2$ env LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings got 1 chars with 0 nothings ... Note that TCL_STDIN instead of a real socket did not show the problem. | |||
User Comments: |
dgp added on 2014-10-17 15:31:14:
....and merged to trunk. dgp added on 2014-10-17 15:21:06: More complete fix with tests committed to core-8-5-branch. anonymous added on 2014-10-17 08:43:43: --- tcl8.5.17rc/generic/tclIO.c~ 2014-10-13 19:34:12.000000000 +0200 +++ tcl8.5.17rc/generic/tclIO.c 2014-10-17 10:39:44.000000000 +0200 @@ -4473,7 +4473,7 @@ } if (GotFlag(statePtr, CHANNEL_BLOCKED|CHANNEL_NONBLOCKING) == (CHANNEL_BLOCKED|CHANNEL_NONBLOCKING)) { - goto restore; + /* goto restore; */ } /* Confirmed: with this patch against [59a0067f32], the TCL client/server test script below and my C++-level app work ok on a first quick check. dgp added on 2014-10-16 17:04:33: While I work on refining the tests and fix, please confirm that removing (or commenting out) line 4476 of tclIO.c is an effective way to solve the problem you report. Thanks. dgp added on 2014-10-16 11:41:32: closing in on the fix; stay tuned. thanks for the great demos. anonymous added on 2014-10-16 08:28:44: Here are two plain TCL scripts which show the problem on Linux/Opensuse 13.1 and on Windows 7. When running via 8.5.16/8.5.17rc the client stops rather quickly. When running against 8.5.15 or 8.6.1 the clients reads data as expected. Server: # ---------------------------- socket -server accept 65422 proc accept {chan addr port} { fconfigure $chan -buffering none -translation binary while {![catch { after 1 puts $chan "R4C7C W4C7C" after 7 puts $chan "R46D0 W46D0" after 6 puts $chan "R47AC W47AC" after 6 puts $chan "A01BDD10FC07D21C188E2C27D1CC37D1D" after 0 puts $chan "A01BDD2B0C07D22C188E4C27D1DC37D1E" }]} { # nothing } } vwait forever # ---------------------------- Client # ---------------------------- set fd [socket localhost 65422] fconfigure $fd -blocking 0 -translation binary -buffering none fileevent $fd readable [list read_socket $fd] set last 0 proc read_socket {fd} { set now [clock milliseconds] gets $fd line if {[eof $fd]} { close $fd return } if {$line ne ""} { puts "$now [expr {$now-$::last}] $line" set ::last $now } } vwait forever # ---------------------------- anonymous added on 2014-10-15 19:38:09: arghhh. the 64khz in the previous remark is nonsense, the interval is something around 50Hz Here is a TCL script which can serve as input: puts "Resume" puts "RB000 WB000" fconfigure stdout -buffering none -translation binary while {1} { puts "R15D4 W15D4" after 1 puts "R4780 W4780" after 1 puts "R4780 W4780" after 1 puts A01A43E79C07D20C186F7C27D1BC37D1E after 1 puts A01A44019C07D20C186EEC27D1DC37D1E after 20 } tclsh above-code | netcat -l 65422 anonymous added on 2014-10-15 19:30:30: It seems the timing of the incoming data is important, the yes | netcat does not work reliably as data source to provoke the error. In my case the remote host sends data with 64kHz, some lines with 33 chars and some lines with 11 chars. With this timing I get: 8.5.17 sh-4.2$ env LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t 192.168.2.1 got 6 chars with 3170 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings got -1 chars with 1000001 nothings ... 8.5.15 got 6 chars with 3625 nothings got 36 chars with 4736 nothings got 11 chars with 7845 nothings got 11 chars with 3386 nothings got 11 chars with 2608 nothings got 11 chars with 13933 nothings got 33 chars with 11034 nothings got 33 chars with 0 nothings got 11 chars with 2421 nothings got 11 chars with 15560 nothings got 11 chars with 15089 nothings got 33 chars with 12023 nothings got 33 chars with 0 nothings got 11 chars with 2774 nothings got 11 chars with 16272 nothings got 11 chars with 15980 nothings got 33 chars with 13336 nothings got 33 chars with 0 nothings got 11 chars with 3022 nothings ... |