Tcl Source Code

View Ticket
Login
Ticket UUID: b3977d199b08e3979a8da970553d5209b3042e9c
Title: Channel gets inserts spurious empty lines on receiving torn CR-LF
Type: Bug Version: 8.6
Submitter: apnadkarni Created on: 2022-07-03 06:22:57
Subsystem: 25. Channel System Assigned To: dgp
Priority: 5 Medium Severity: Severe
Status: Closed Last Modified: 2022-07-27 13:31:59
Resolution: Fixed Closed By: sebres
    Closed on: 2022-07-27 13:31:59
Description:

When a CR-LF in an incoming stream is split across packets, and the LF-containing packet is not received when the CR-containing packet is read by the channel, the application receives an spurious empty line on gets invocations. To illustrate, run the attached server.tcl and client.tcl files. Seen output is

d:\src\tcltk\console\tests>tclsh client.tcl
<abc>
<defghi>
<>
<jkl>

Note the extra <> line which should not be present.

Note that this is not specific to socket streams although that's where it is easiest to demonstrate. Also happens on Linux (at least) so not Windows specific. The channel needs to be non-blocking and in auto translation mode.

Marking as severe because although the conditions to trigger are extremely rare, any data corruption in transit is a Bad Thing.

User Comments: sebres added on 2022-07-27 13:31:59:

fixed in [a3ed092aab], thus close


jan.nijtmans added on 2022-07-25 21:29:10:

@apnadkarni, is this bug fixed in your opinion? If so, let's go for it!


sebres added on 2022-07-22 17:43:28:

[d2a8078086] must fix it ultimately


sebres added on 2022-07-22 16:33:06:

As for failed test io-29.36.1, it was my mistake: translation {auto} on client-side caused that answer "client" reached the server-side as "client\r\n", and [gets] by translation {binary} behaves nearly identical to {lf} mode, so server sends "really client\r?\r" in the next round, therefore looks like "?"-char got trimmed out (but in fact the client sent inappropriate answer).

[5f7f4ab975] fixes that - switching to {auto lf} now, similar to first attempt with {cr lf}, so io-29.36.1 works fine and will cover the possible regression if this bug becomes fixed incompatibly.


sebres added on 2022-07-21 19:26:50:

[350f5b9a9e694bca] implements the test-cases covering the -translation auto bug, just ...

Just now I guess I found another inconsistency related to auto translation by pure "\r" EOL-char: I wanted to add regression test (which tcl seems to miss right now) to cover simple QA communication with auto-translation for CR-only case and the test failed after all - it must generate the same result by "cr" and "auto" translation, but it seems to truncate 1 char received by [gets] in second QA round in "auto" translation case:

    ---- Result was:
    {cr lf} {really client?} yes auto {unexpected input "really client"} wrong
    ---- Result should have been (exact matching):
    {cr lf} {really client?} yes auto {really client?} yes
    ==== io-29.36.1 FAILED
This is really weird, because I could swear I saw that working, just tested it with older 8.5 and the result is the same


sebres added on 2022-07-21 16:47:31:

Test illustrating potentially undefined behavior (delay dependent):

proc accept {so args} {
  fconfigure $so -translation binary
  puts -nonewline $so "1 line\r"
  puts -nonewline $so "\n2 li"
  flush $so
  #Now force separate packets
  puts -nonewline $so "ne\r"
  flush $so
  if {[incr ::cnt] & 1} {
    after 20 {set ::cnt $::cnt}; vwait ::cnt; # simulate short delay (and client can process events)
  } else {
    after 20; # small delay without processing events (just to avoid large output), client would receive the buffer in single chunk
  }
  puts -nonewline $so "\n3 line"
  flush $so
  close $so
}
set so [socket -server accept 10000]
proc test {} {
  if {1} {
    set chan [socket 127.0.0.1 10000]
  } else {
    set chan stdin
  }
  fileevent $chan readable [list apply {chan {
    if {[gets $chan line] >= 0} {
      puts <$line>
    } elseif {[eof $chan]} {
      close $chan
      set ::done 1
    }
  }} $chan]
  fconfigure $chan -blocking 0 -buffering line
  vwait ::done
  puts "----$::cnt---"
  update
}
timerate {test} 100

The result will show an extra NL by each odd attempt (due to immediate processing of every buffer) and no extra NL by each even attempt (buffers coming as 1 piece):

<1 line>
<2 line>
<>
<3 line>
----1---
<1 line>
<2 line>
<3 line>
----2---
<1 line>
<2 line>
<>
<3 line>
----3---
<1 line>
<2 line>
<3 line>
----4---
<1 line>
<2 line>
<>
<3 line>
----5---


oehhar added on 2022-07-04 14:32:53:

Ashok,

great finding.

When I was in the Windows socket driver update, Donald Porter was the most experienced person in this field. I recommend sending an E-Mail to Donald or asking on the core list.

I found this part of TCL surprisingly difficult.

Take care, Harald


apnadkarni added on 2022-07-03 06:42:53:

Here is my analysis of what is happening (most action is inside Tcl_GetObj)

When a CR is read from the channel buffer but the accompanying LF has not arrived, the channel code passes up the line as the gets result and also remembers that a CR has been seen by setting a flag

When subsequently the LF is received, the channel subsystem correctly skips it by bumping the pointer into the receive buffer past the LF.

However, later in the code, the pointer offset is reset to its old value which means the next gets sees it as a new line and in auto translation mode returns an empty line as a result.

This last reset of value is the root of the problem. I do not understand all the manipulations of the buffers and offsets and why that is necessary so hesitant to suggest a fix.


Attachments: