Author: Jan Nijtmans <email@example.com> Author: Jan Nijtmans <firstname.lastname@example.org> State: Draft Type: Project Vote: Pending Created: 10-May-2019 Post-History: Discussions-To: Tcl Core list Keywords: Tcl Tcl-Version: 8.7 Tcl-Branch: utf-max
This TIP proposes to add more encodings, and being able to switch Tcl between Full Unicode mode (TCL_UTF_MAX>3, almost compatible with Androwish) and current partial Unicode mode (as far as TIP #389 goes, using TCL_UTF_MAX=3)
Tcl currently can be compiled in 3 different modes: using TCL_UTF_MAX=3, TCL_UTF_MAX=4 or TCL_UTF_MAX=6. The first 2 are actually equal now in Tcl 8.7 (since TIP #389). Using TCL_UTF_MAX=6 is actually overkill, since no utf-8 character consists of more than 4 bytes.
Therefore it makes sense to reduce this to only two modes: TCL_UTF_MAX=3 means being fully compatible with Tcl 8.6, while TCL_UTF_MAX=4 means compatibility with the Androwish-version of Tcl. Defining TCL_UTF_MAX=6 results in a valid compilation as well (functioning the same as TCL_UTF_MAX=4), only some buffer-sizes will be 2 bytes larger than necessary.
Androwish made the choice to use an (at that time) un-supported Tcl mode: Changing the size of the Tcl_UniChar type using TCL_UTF_MAX=6. This causes a binary incompatibility which results that all extensions need to be re-compiled with TCL_UTF_MAX=6 as well. This TIP proposes to add a supported TCL_UTF_MAX=4 compilation mode to Tcl, which has the same effect as the earlier unsupported TCL_UTF_MAX=6, but without the need to re-compile all extensions. The need for re-compilation of all extensions is eliminated by putting the 32-bit versions of the Tcl_UniChar-related functions in different stub entries than the 16-bit versions. This way, 99% of all extensions compiled with TCL_UTF_MAX=3 keep functioning as before without the need for re-compilation.
The default compilation mode for Tcl will continue to be TCL_UTF_MAX=3, which is 100% upwards compatible with Tcl 8.6.
This document proposes:
Add new encodings "utf-16", "utf-16le", "utf-16be", "ucs-2", "ucs-2le", "ucs-2be".
Allow Tcl to be compiled with either -DTCL_UTF_MAX=3 (default), or with -DTCL_UTF_MAX=4. In the latter mode, the Tcl_UniChar type becomes a 32-bit type, but the stub entries for the 16-bit Tcl_UniChar type are present as well. So, most extensions compiled with -DTCL_UTF_MAX=3 will continue to work in either Tcl mode (for caveats, see below).
Allow Tcl extensions to be compiled with either -DTCL_UTF_MAX=3 (default), or with -DTCL_UTF_MAX=4, when Tcl is compiled with -DTCL_UTF_MAX=4.
Deprecate the "unicode" encoding. "utf-16" is supposed to be used in stead. The "unicode" encoding will NOT be removed in Tcl 9.0, since it's too common.
Enhance the Tcl_UniCharToUtfDString() function such that the uniLength parameter is allowed to have the value -1.
Deprecate the following functions:
If Tcl is compiled with either -DTCL_UTF_MAX=4 or -DTCL_NO_DEPRECATED, those functions will no longer be available for extensions, as well as in Tcl 9.0.
Those are the same as the UniChar variants, but they use the "unsigned short" type in stead of Tcl_UniChar.
Those functions can be used if you want your extension to compile with either -DTCL_UTF_MAX=3 or -DTCL_UTF_MAX=4, but still want to use the 16-bit conversions independent on the TCL_UTF_MAX setting or Tcl_UniChar type.
As long as Tcl is compiled with -DTCL_UTF_MAX=3, this is fully upwards compatible.
When Tcl is compiled with -DTCL_UTF_MAX=4, this is at the Tcl level, compatible with the Androwish-version of Tcl with one exception: In Androwish the "unicode" encoding is 32-bit, in Tcl it continues to be 16-bit, an alias for "utf-16". At the C-API level, it's upwards compatible with Tcl 8.6 in TCL_UTF_MAX=6 mode, except for the functions marked above as deprecated. Those functions will be gone.
Extensions compiled with -DTCL_UTF_MAX=4 cannot use any of the deprecated functions mentioned in this TIP. Using any of them results in a link error.
If Tcl is compiled with -DTCL_UTF_MAX=4, the deprecated functions will be gone. Any extension using those, even if the extesion is compiled with -DTCL_UTF_MAX=3, won't work any more.
A reference implementation is available in the utf-max branch. https://core.tcl.tk/tcl/timeline?r=utf-max
This document has been placed in the public domain.