Ticket UUID: | 800753 | |||
Title: | -eofchar accepts only single-byte character? | |||
Type: | Bug | Version: | obsolete: 8.5a0 | |
Submitter: | dgp | Created on: | 2003-09-04 22:43:40 | |
Subsystem: | 25. Channel System | Assigned To: | dgp | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2007-12-13 00:28:51 | |
Resolution: | Fixed | Closed By: | dgp | |
Closed on: | 2007-12-12 17:28:51 | |||
Description: |
When configuring a channel's EOF character with [fconfigure -eofchar], the character gets stored in the inEofChar field of the ChannelState struct . This field is of type int, so it is large enough to hold any Tcl_UniChar value. However, in the TranslateInputEOL() routine in tclIO.c, the value in the inEofChar field is compared against the buffer coming in from the channel byte by byte. This means only character values in the range 0-255 can be used as EOF characters? If this is intended, or too difficult to fix, then the docs could use clarification. Otherwise, since it appears that EOF character detection comes after translation to UTF-8 encoding, it seems that -eofchar could be corrected to let any Unicode character serve as the EOF marker. | |||
User Comments: |
patthoyts added on 2007-11-28 08:13:48:
Logged In: YES user_id=202636 Originator: NO This patch broke Windows which sets up the console channel as -eofchar "\032 {}" meaning input ^Z, and no output char. The range restriction of 1 <> 0x7F causes this to be an error as the outValue becomes 0 (no eofchar). Fixed by changing the permissible range in the code to include 0. dgp added on 2007-11-28 02:48:47: Logged In: YES user_id=80530 Originator: YES fix committed for 8.5.0. still open at lower prio for possible backport. andreas_kupries added on 2007-11-28 01:11:11: Logged In: YES user_id=75003 Originator: NO Yes. dgp added on 2007-11-28 00:55:36: Logged In: YES user_id=80530 Originator: YES that fix sounds ok to me; agree andreas? stwo added on 2007-11-10 06:38:57: Logged In: YES user_id=143350 Originator: NO Is this adequate? Add it to the end of the -eofchar paragraph in fconfigure.n. The acceptable range for \fB\-eofchar\fR values is \ex01 - \ex7f; attempting to set \fB\-eofchar\fR to a value outside of this range will generate an error. dgp added on 2007-11-10 02:54:39: Logged In: YES user_id=80530 Originator: YES patching to raise explicit error sounds better than silent failure. thanks. Document the limitation too, and this one can be done. thanks. msofer added on 2007-11-09 23:58:01: File Added - 253585: tclIO.c.patch Logged In: YES user_id=148712 Originator: NO Attaching patch #1829070 (and closing that ticket) File Added: tclIO.c.patch stwo added on 2007-11-09 23:13:00: Logged In: YES user_id=143350 Originator: NO Currently, -eofchar will not work as expected if the value of -eofchar is outside the range 0x01-0x7f. The -eofchar [fconfigure] option is rarely used. When it is used, it is rare for the value to be other than 0x1a (^Z). Modifying Tcl to accept an -eofchar outside of this range is fraught with complications. It is not worth the effort. If someone does indeed need an -eofchar outside of this range they can speak up and then we (I? Them?) can work towards fixing the problem. Until then, I recommend limiting the accepted range of -eofchar to 0x01-0x7f. Patch submitted (against Tcl8.5b2) to generate an error if -eofchar is outside of that range. The testsuite passes with this patch installed. andreas_kupries added on 2007-11-06 03:17:56: Logged In: YES user_id=75003 Originator: NO I think only if you accept loss of perf, possibly. We would have to change the search for EOF in the Translate code. I do not remember if we have utf at that level or external. Whichever, we have to convert either inEofChar, or the char under consideration to have matching encodings, making the compare slower. Only alternative I can see is to change the ChannelState structure to hold a char in the proper encoding (and maybe in utf to handle encoding changes). Even so the cmp is likely slower. dgp added on 2007-11-06 03:08:35: Logged In: YES user_id=80530 Originator: YES 1823576 got fixed; how about this one? dgp added on 2006-11-23 03:20:26: Logged In: YES user_id=80530 Originator: YES Two years later... any prospects for fixing this? dgp added on 2004-11-22 06:35:40: Logged In: YES user_id=80530 any prospects for fixing this? andreas_kupries added on 2004-07-17 05:55:03: Logged In: YES user_id=75003 This is not intended I believe. However changing it is a bit more complicated, due t the transition from byte-by-byte to char-by-char comparison ... And while the int can hold a UniChar, do we want it to ? Because if we do we also have a utf8-to-unichar conversion to perform for the characters in the buffer = Lower performance. So maybe store an utf8 char instead. The channelstate struct is internal and can be changed for this. So, no way for fixing this in the 8.4.7 timeframe, IMHO. However for 8.5 it should be possible to fix it. |
Attachments:
- tclIO.c.patch [download] added by msofer on 2007-11-09 23:58:01. [details]