Tcl Source Code

View Ticket
Login
Ticket UUID: 800753
Title: -eofchar accepts only single-byte character?
Type: Bug Version: obsolete: 8.5a0
Submitter: dgp Created on: 2003-09-04 22:43:40
Subsystem: 25. Channel System Assigned To: dgp
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2007-12-13 00:28:51
Resolution: Fixed Closed By: dgp
    Closed on: 2007-12-12 17:28:51
Description:
When configuring a channel's EOF character
with [fconfigure -eofchar], the character gets
stored in the inEofChar field of the ChannelState
struct .  This field is of type int, so it is large
enough to hold any Tcl_UniChar value.

However, in the TranslateInputEOL() routine
in tclIO.c, the value in the inEofChar field
is compared against the buffer coming in
from the channel byte by byte.  This means
only character values in the range 0-255
can be used as EOF characters?

If this is intended, or too difficult to fix, then
the docs could use clarification.  Otherwise,
since it appears that EOF character detection
comes after translation to UTF-8 encoding,
it seems that -eofchar could be corrected to
let any Unicode character serve as the EOF 
marker.
User Comments: patthoyts added on 2007-11-28 08:13:48:
Logged In: YES 
user_id=202636
Originator: NO

This patch broke Windows which sets up the console channel as -eofchar "\032 {}" meaning input ^Z, and no output char. The range restriction of 1 <> 0x7F causes this to be an error as the outValue becomes 0 (no eofchar).
Fixed by changing the permissible range in the code to include 0.

dgp added on 2007-11-28 02:48:47:
Logged In: YES 
user_id=80530
Originator: YES


fix committed for 8.5.0.
still open at lower prio
for possible backport.

andreas_kupries added on 2007-11-28 01:11:11:
Logged In: YES 
user_id=75003
Originator: NO

Yes.

dgp added on 2007-11-28 00:55:36:
Logged In: YES 
user_id=80530
Originator: YES


that fix sounds ok to me;
agree andreas?

stwo added on 2007-11-10 06:38:57:
Logged In: YES 
user_id=143350
Originator: NO

Is this adequate? Add it to the end of the -eofchar paragraph in fconfigure.n.

The acceptable range for \fB\-eofchar\fR values is \ex01 - \ex7f; attempting to set \fB\-eofchar\fR to a value outside of this range will generate an error.

dgp added on 2007-11-10 02:54:39:
Logged In: YES 
user_id=80530
Originator: YES


patching to raise explicit
error sounds better than silent
failure.  thanks.

Document the limitation too, and
this one can be done.  thanks.

msofer added on 2007-11-09 23:58:01:

File Added - 253585: tclIO.c.patch

Logged In: YES 
user_id=148712
Originator: NO

Attaching patch #1829070 (and closing that ticket)
File Added: tclIO.c.patch

stwo added on 2007-11-09 23:13:00:
Logged In: YES 
user_id=143350
Originator: NO

Currently, -eofchar will not work as expected if the value of -eofchar is outside the range 0x01-0x7f.
The -eofchar [fconfigure] option is rarely used.
When it is used, it is rare for the value to be other than 0x1a (^Z).
Modifying Tcl to accept an -eofchar outside of this range is fraught with complications.
It is not worth the effort.
If someone does indeed need an -eofchar outside of this range they can speak up and then we (I? Them?) can work towards fixing the problem.
Until then, I recommend limiting the accepted range of -eofchar to 0x01-0x7f.
Patch submitted (against Tcl8.5b2) to generate an error if -eofchar is outside of that range.
The testsuite passes with this patch installed.

andreas_kupries added on 2007-11-06 03:17:56:
Logged In: YES 
user_id=75003
Originator: NO

I think only if you accept loss of perf, possibly. We would have to change the search for EOF in the Translate code. I do not remember if we have utf at that level or external. Whichever, we have to convert either inEofChar, or the char under consideration to have matching encodings, making the compare slower. Only alternative I can see is to change the ChannelState structure to hold a char in the proper encoding (and maybe in utf to handle encoding changes). Even so the cmp is likely slower.

dgp added on 2007-11-06 03:08:35:
Logged In: YES 
user_id=80530
Originator: YES


1823576 got fixed; how about this one?

dgp added on 2006-11-23 03:20:26:
Logged In: YES 
user_id=80530
Originator: YES


Two years later...

any prospects for fixing this?

dgp added on 2004-11-22 06:35:40:
Logged In: YES 
user_id=80530

any prospects for fixing this?

andreas_kupries added on 2004-07-17 05:55:03:
Logged In: YES 
user_id=75003

This is not intended I believe. However changing it is a bit
more complicated, due t the transition from byte-by-byte to
char-by-char comparison ... And while the int can hold a
UniChar, do we want it to ? Because if we do we also have a
utf8-to-unichar conversion to perform for the characters in
the buffer = Lower performance. So maybe store an utf8 char
instead. The channelstate struct is internal and can be
changed for this.

So, no way for fixing this in the 8.4.7 timeframe, IMHO.
However for 8.5 it should be possible to fix it.

Attachments: