Tcl Source Code

View Ticket
Login
Ticket UUID: 219210
Title: fcopy does not respect encodings
Type: Bug Version: obsolete: 8.2.2
Submitter: nobody Created on: 2000-10-26 05:03:38
Subsystem: 24. Channel Commands Assigned To: andreas_kupries
Priority: 6 Severity:
Status: Closed Last Modified: 2001-05-20 00:05:41
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2001-05-19 17:05:41
Description:
OriginalBugID: 3662 Bug
Version: 8.2.2
SubmitDate: '1999-11-23'
LastModified: '1999-12-07'
Severity: SER
Status: Assigned
Submitter: techsupp
ChangedBy: hobbs
OS: All
FixedDate: '2000-10-25'
ClosedDate: '1999-12-06'


Name:
Nikolai Saoukh

set out [open k.txt w]
fconfigure $out -encoding koi8-r
puts $out "\u0410\u0410"
close $out

set in [open k.txt r]
fconfigure $in -encoding koi8-r

set out [open u.txt w]
fconfigure $out -encoding utf-8

fcopy $in $out

close $in
close $out


It's not certain whether this is an RFE or bug, as the fcopy
man page states that it only pays heed to the -translation
option, not to -encoding.  This could be a bug, since -encoding
came after fcopy was originally written.  The work-around is to
replace the above fcopy with:
    puts $out [read $in]
although there is no callback capability then. 
-- 12/07/1999 hobbs
User Comments: andreas_kupries added on 2001-05-20 00:05:41:
Logged In: YES 
user_id=75003

Patch committed.

andreas_kupries added on 2001-04-19 03:27:32:

File Added - 5515: fce.Benchmark.out.tar.gz

andreas_kupries added on 2001-04-19 03:23:21:

File Added - 5514: 219210.diff.5

andreas_kupries added on 2001-04-19 03:23:20:
Logged In: YES 
user_id=75003

Another small change to the patch, fixing another bug. Found
through tclbench, i.e. performance tests. Will add the
performance results too, with notes.

andreas_kupries added on 2001-04-06 22:20:35:

File Added - 5076: 219210.diff.3

andreas_kupries added on 2001-04-06 22:20:34:
Logged In: YES 
user_id=75003

New patch, changing the fix slightly so that it doesn't do
conversions which are not necessary. IOW, if both channels
are set to the same encoding no conversion will occur and
the transfer will run at the full old speed. The first patch
did conversions in this case, lowering performance for a
common case.

andreas_kupries added on 2001-04-04 21:36:55:

File Added - 4953: 219210.diff.1

andreas_kupries added on 2001-04-04 21:36:54:
Logged In: YES 
user_id=75003

Uploading a patch fixing the bug, adding tests and extending
the documentation.

andreas_kupries added on 2001-03-31 20:22:45:
Logged In: YES 
user_id=75003

The fix for this report has to be done in "tclIO.c",
"CopyData", which
currently uses "DoRead" and "DoWrite" for reading from and
writing to
the channels involved in the copying. Exchanging these two
calls
against stripped down versions of "Tcl_ReadChars" and
"Tcl_WriteChars"
should do the trick. "Stripped down" means here that we have
avoid the
call to "CheckChannelErrors" in these two routines as this
routine
flags their usage for a channel used in an "fcopy" as an
error. I
would propose to move the meat of these two routines into
two internal
procedures "DoReadChars" and "DoWriteChars" which are then
called from
the original routines. The originals would retain the error
checking. And "CopyData" can use the internal procedures to
get its
own work done.

Note the following consequences of the change:

- The system will use UTF-8 internally when copying data,
meaning that it will consume more memory, or copy less data
per buffer.

- Performance will be affected negatively because of the
additional conversions to and from UTF-8. (Side note: Do we
have performance tests for "fcopy" in "tclbench" ?).

The code of the channel system uses 'statePtr->encoding ==
NULL' as
signal that the encoding is binary, and the two "Tcl_*Chars"
procedures have special provisions for that case.
Unfortunately not
very efficient as it involves ByteArray objects.

I would propose that "CopyData" should check for binary
translation on
_both_ channels and fall back to the old code in such a
case. This
would avoid quite a lot of conversions and copying. We
shouldn't do
this for a mixture of binary and non-binary encodings as we
still need
an intermediate ByteArray to get the conversion right for
these cases,
causing additional complexity in the new code. Better to
stay with the
tried and true code for that for now.

andreas_kupries added on 2001-03-31 00:05:13:
Logged In: YES 
user_id=75003

Another problem with the workaround: It uses much more
memory than fcopy because the whole file is loaded into the
interpreter before written back out.

andreas_kupries added on 2000-11-17 19:08:40:
I consider it as bug, an incomplete adaption of the existing commands to the new i18n features, i.e. encodings. Recategorized to OtherIO as this problem is not restricted to sockets.

Attachments: