Ticket UUID: | 219210 | |||
Title: | fcopy does not respect encodings | |||
Type: | Bug | Version: | obsolete: 8.2.2 | |
Submitter: | nobody | Created on: | 2000-10-26 05:03:38 | |
Subsystem: | 24. Channel Commands | Assigned To: | andreas_kupries | |
Priority: | 6 | Severity: | ||
Status: | Closed | Last Modified: | 2001-05-20 00:05:41 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2001-05-19 17:05:41 | |||
Description: |
OriginalBugID: 3662 Bug Version: 8.2.2 SubmitDate: '1999-11-23' LastModified: '1999-12-07' Severity: SER Status: Assigned Submitter: techsupp ChangedBy: hobbs OS: All FixedDate: '2000-10-25' ClosedDate: '1999-12-06' Name: Nikolai Saoukh set out [open k.txt w] fconfigure $out -encoding koi8-r puts $out "\u0410\u0410" close $out set in [open k.txt r] fconfigure $in -encoding koi8-r set out [open u.txt w] fconfigure $out -encoding utf-8 fcopy $in $out close $in close $out It's not certain whether this is an RFE or bug, as the fcopy man page states that it only pays heed to the -translation option, not to -encoding. This could be a bug, since -encoding came after fcopy was originally written. The work-around is to replace the above fcopy with: puts $out [read $in] although there is no callback capability then. -- 12/07/1999 hobbs | |||
User Comments: |
andreas_kupries added on 2001-05-20 00:05:41:
Logged In: YES user_id=75003 Patch committed. andreas_kupries added on 2001-04-19 03:27:32: File Added - 5515: fce.Benchmark.out.tar.gz andreas_kupries added on 2001-04-19 03:23:21: File Added - 5514: 219210.diff.5 andreas_kupries added on 2001-04-19 03:23:20: Logged In: YES user_id=75003 Another small change to the patch, fixing another bug. Found through tclbench, i.e. performance tests. Will add the performance results too, with notes. andreas_kupries added on 2001-04-06 22:20:35: File Added - 5076: 219210.diff.3 andreas_kupries added on 2001-04-06 22:20:34: Logged In: YES user_id=75003 New patch, changing the fix slightly so that it doesn't do conversions which are not necessary. IOW, if both channels are set to the same encoding no conversion will occur and the transfer will run at the full old speed. The first patch did conversions in this case, lowering performance for a common case. andreas_kupries added on 2001-04-04 21:36:55: File Added - 4953: 219210.diff.1 andreas_kupries added on 2001-04-04 21:36:54: Logged In: YES user_id=75003 Uploading a patch fixing the bug, adding tests and extending the documentation. andreas_kupries added on 2001-03-31 20:22:45: Logged In: YES user_id=75003 The fix for this report has to be done in "tclIO.c", "CopyData", which currently uses "DoRead" and "DoWrite" for reading from and writing to the channels involved in the copying. Exchanging these two calls against stripped down versions of "Tcl_ReadChars" and "Tcl_WriteChars" should do the trick. "Stripped down" means here that we have avoid the call to "CheckChannelErrors" in these two routines as this routine flags their usage for a channel used in an "fcopy" as an error. I would propose to move the meat of these two routines into two internal procedures "DoReadChars" and "DoWriteChars" which are then called from the original routines. The originals would retain the error checking. And "CopyData" can use the internal procedures to get its own work done. Note the following consequences of the change: - The system will use UTF-8 internally when copying data, meaning that it will consume more memory, or copy less data per buffer. - Performance will be affected negatively because of the additional conversions to and from UTF-8. (Side note: Do we have performance tests for "fcopy" in "tclbench" ?). The code of the channel system uses 'statePtr->encoding == NULL' as signal that the encoding is binary, and the two "Tcl_*Chars" procedures have special provisions for that case. Unfortunately not very efficient as it involves ByteArray objects. I would propose that "CopyData" should check for binary translation on _both_ channels and fall back to the old code in such a case. This would avoid quite a lot of conversions and copying. We shouldn't do this for a mixture of binary and non-binary encodings as we still need an intermediate ByteArray to get the conversion right for these cases, causing additional complexity in the new code. Better to stay with the tried and true code for that for now. andreas_kupries added on 2001-03-31 00:05:13: Logged In: YES user_id=75003 Another problem with the workaround: It uses much more memory than fcopy because the whole file is loaded into the interpreter before written back out. andreas_kupries added on 2000-11-17 19:08:40: I consider it as bug, an incomplete adaption of the existing commands to the new i18n features, i.e. encodings. Recategorized to OtherIO as this problem is not restricted to sockets. |
Attachments:
- fce.Benchmark.out.tar.gz [download] added by andreas_kupries on 2001-04-19 03:27:32. [details]
- 219210.diff.5 [download] added by andreas_kupries on 2001-04-19 03:23:20. [details]
- 219210.diff.3 [download] added by andreas_kupries on 2001-04-06 22:20:34. [details]
- 219210.diff.1 [download] added by andreas_kupries on 2001-04-04 21:36:54. [details]