Tcl Source Code

View Ticket
Login
Ticket UUID: 1256872
Title: Unicode support for Win32 (NT) console channels
Type: Patch Version: None
Submitter: a_kovalenko Created on: 2005-08-11 16:34:19
Subsystem: 27. Channel Types Assigned To: patthoyts
Priority: 8 Severity:
Status: Closed Last Modified: 2010-08-06 05:07:23
Resolution: Out of Date Closed By: hobbs
    Closed on: 2010-08-05 22:07:23
Description:
This patch, prepared agains the recent TCL8.5 CVS,
makes the Win32 console channel driver use ReadConsoleW
and WriteConsoleW where these functions are available
(NT/2k/XP/2003), without (hopefully) breaking anything
on other systems. TclWinOpenConsoleChannel will set the
channel encoding to unicode when appropriate; thus all
the applications that do gets/puts on the console,
without resetting its options, will not notice any
difference.

Please let me know if such a change requires a TIP to
be included.
User Comments: hobbs added on 2010-08-06 05:07:23:

allow_comments - 1

a_kovalenko added on 2010-08-05 08:47:20:
It turned out that this _is_ still enabled both in 8.5 and 8.6 heads, and in 8.5.8 release, too (as I've migrated to 8.6 a while ago and didn't monitor 8.5, I was sure that it was removed from 8.5 and left enabled on 8.6 only). Somehow it got rejected but not rolled back (or rolled back on 8.4 only, as I suppose). Now we may at least be sure that it doesn't cause a lot of trouble, as it was there for 4+ years. After all those years, keeping it would be _maintaining_ compatibility, not _breaking_ it :)

Unicode console works, unless you try to fconfigure it to some other encoding than "unicode". The situation differs from old, non-unicode I/O in one respect only: "ANSI" codepages are ASCII-compatible, while UCS-2 is not, so after incorrect fconfigure the old code continues to "work" with ASCII subset, which is enough to enter another fconfigure command, for example; but UCS-2 (called "unicode" in TCL) is so different from ASCII and most other codepages, that tclsh seems to be hanged after incorrect fconfigure.

hobbs added on 2010-08-05 05:00:02:
Is there a newer patch against the current 8.5 or head sources to reenable this?  Currently things are working ok for me, so I'd also like to know how to better test this, and why my utf-8 expectations are incorrect for that.

Note that we only need to worry about NT+ systems in 8.6, if that makes like easier.

a_kovalenko added on 2010-03-01 07:06:28:
As this artifact is still open, let me defend this implementation. My purpose is to ensure that it won't be rolled back for 8.6 as well.

First, [fconfigure stdin -encoding utf8] didn't work before this patch, it just pretended to: utf-8 and a typical console codepage (whatever it be) have ASCII subset in common. Both with this patch and without it, the console channel after [fconfigure ... utf-8] is misconfigured, i.e. it cannot be used to input non-ascii characters correctly. The only difference is the amount of trouble caused by this misconfiguration.

Mr. Hobbs seems to expect a solution that notices the upper-level -encoding reconfiguration on console channel and somehow takes it into account when the low-level I/O is done. If it's indeed the expected property of a "complete implementation", I would present an objection: no channel type ever worked this way, be it console channels or any other channels. The channel at the lowest level is a stream of bytes, and encoding translation is layered on top of it, but it doesn't affect the channel I/O behavior, ever. This principle was respected by Tcl from the very start of its unicode subsystem; it's also what extension and application developers do expect and will always expect.

However, there is sometimes a reason for the channel type to influence _initial_ settings of the upper-level translation procedures: for example, TCP sockets require CR-LF line endings for most standard text-based network protocols. The same thing is true for the "unpatched" console channels: they preset the encoding to an autodetected codepage (result of GetConsoleCP); but they don't try to synchronize further channel encoding changes with console codepage. Once the channel is created, the application is free to fconfigure it to any encoding (breaking the correctness of translation, of course); if the console codepage is changed after the channel creation (exec cmd /c chcp...), the "real" low-level encoding and the tcl translation-level encoding are again out of sync.

"Patched" implementation just detects and uses "unicode" as initial setting, exactly the same way as both implementations act on non-unicode systems, detecting non-unicode codepages. It doesn't prevent an application from altering it, even if it makes the channel unusable.

To be short, I could say that [fconfigure stdin -encoding unicode] will hang the "unpatched" implementation and work with "patched" one: just the same as the reverse is true for -encoding utf-8 (which fact was the reason of rolling back this patch).

There is, however, one possible compromise that may be added to my implementation to restore the backward compatibility, making the channel not-so-obviously-broken after encoding misconfiguration: I can use utf-8 as the "presented low-level" encoding on unicode systems, and do the utf8<->unicode translation during the I/O. This way, utf-8 will be set up as a channel encoding by default; if any application reconfigures the channel to utf-8 again, everything will work (not just pretend to); and if the channel is reconfigured to some wrong encoding being a superset of ASCII, the thing at least won't _hang_.

Hereby I request comments from Mr. Hobbs, both of the current state of the patch and on the proposed improvement, on the possibility of backporting it again into 8.5 and 8.4 once the improvement is made, and on the plans for 8.6 (is it acceptable as is, or is it not rolled back only by accident? will it be acceptable if the change described above is made?).

hobbs added on 2006-03-29 05:40:49:
Logged In: YES 
user_id=72656

See also comments in bug 1442305.

hobbs added on 2006-03-29 04:01:56:
Logged In: YES 
user_id=72656

This is being reopened because it was not a complete
implementation after all.  You will see the problem by
running tclsh in XP and doing 'fconfigure stdin -encoding
utf-8'.  This should output correctly (it used to), but
hangs with this patch.

I have reverted for 8.4.13, but it should be reverted or
corrected for 8.5.  I left in the read|writeConsoleProc bits
in case a corrected solution is presented that handles the
internal channel encoding changes.

hobbs added on 2006-03-01 04:59:22:
Logged In: YES 
user_id=72656

This was addressed in Expect, updated to track channel
encodings.

hobbs added on 2006-01-25 00:22:05:
Logged In: YES 
user_id=72656

Reopening - this caused Expect for Windows to break.  We
need to revisit whether this is a core issue that may effect
other extensions, or whether Expect for Windows must adapt.

patthoyts added on 2005-11-03 18:58:18:
Logged In: YES 
user_id=202636

Oh - well I just committed a backport already :) Thank you
anyway.

a_kovalenko added on 2005-11-03 17:10:08:

File Added - 154878: tcl-winunicon2-8-4.patch

a_kovalenko added on 2005-11-03 17:10:07:
Logged In: YES 
user_id=241496

Backport done (tcl-winunicon2-8-4.patch).

dkf added on 2005-11-03 16:08:56:
Logged In: YES 
user_id=79902

Backport to 8.4 needed

patthoyts added on 2005-11-03 08:18:39:
Logged In: YES 
user_id=202636

Works fine for me. Test suite passes and now the console can
output cyrillic chars and so on.
Seeing the positive comments from david - applied.

a_kovalenko added on 2005-08-24 05:43:35:

File Added - 146736: tcl-winunicon2.patch

a_kovalenko added on 2005-08-24 05:43:27:
Logged In: YES 
user_id=241496

Sorry to all,
There was a typo in the first variant of this patch
(ReadConsoleW and WriteConsoleW were mistakenly used on
non-unicode systems).
Fixed (new variant attached).

davygrvy added on 2005-08-16 11:46:22:
Logged In: YES 
user_id=7549

I haven't seen the guts of the patch yet, but this gets my
vote of approval.  I can't do the work of the test/commit
due to my lack of dev tools on this new computer of mine.. 
passing to JH

hobbs added on 2005-08-12 04:16:48:
Logged In: YES 
user_id=72656

I think this is a candidate for 8.4 and 8.5 (no compat issues).

a_kovalenko added on 2005-08-12 01:19:29:
Logged In: YES 
user_id=241496

Tcl RFE 491789 is unrelated to this patch (though I was
thinking, while submitting this patch, about GetCommandLineW
as a next step to better unicode support).

This patch is not about command-line parameters, it's about
console I/O (i.e stdin, stdout, and stderr of tclsh.exe). As
TCL already has a separate channel driver for Win32 console,
the required changes are minimal and they don't affect
neither TCL API nor signature of Tcl_Main.

dgp added on 2005-08-12 00:51:48:
Logged In: YES 
user_id=80530


How does is this patch related to
Tcl RFE 491789 ?

a_kovalenko added on 2005-08-11 23:34:23:

File Added - 145280: tcl-winunicon.patch

Attachments: