TIP 657: Make "-profile strict" the default in Tcl 9.0

Login
Author:         Jan Nijtmans <[email protected]>
Author:         Nathan Coulter <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Tcl-Version:    9.0
Tcl-Branch:     tip-657
Vote-Summary:	Accepted 6/0/1
Votes-For:	AF, AK, JN, KW, MC, SL
Votes-Against:	none
Votes-Present:	DKF

Abstract

This TIP proposes to make "-profile strict" the default. This TIP is intended as replacement for TIP #601, but builds on top of TIP #656 ("A revised proposal for encodings")

Rationale

The tcl8 profile is a legacy profile, which doesn't conform to any recommended behavior, the two other profiles strict and replace do.

Since strict is the recommended profile in most situations, it becomes the default in Tcl 9.0, with a few exceptions. That has some implications at the script level.

Many scripts will have to be adapted, either expecting exceptions for encoding errors or setting the channel profile to "tcl8" or "replace". And functions like "fcopy", "read" and "gets" now will throw an exception when encountering encoding-errors, which might not be expected by external applications/extensions.

Specification

New channels are by default assigned the strict profile, and both encoding convertfrom and encoding convertto use the strict profile by default. The exception for this is the stderr channel, which will default to the replace profile.

Tcl_FSEvalFileEx() uses the strict profile, and therefore source uses the strict profile. All commands except glob use the strict profile.

Tcl_ExternalToUtfDStringEx(), Tcl_UtfToExternalDStringEx(), Tcl_ExternalToUtf() and Tcl_UtfToExternal(), support operation in a mode where any encoding error that occurs results in an EILSEQ POSIX error. That mode is now the default. Other modes can be explicitly configured by the caller (TIP #656) to specify how these functions behave when invalid data are encountered.

Handling of environment variables (syncing between the ::env array and the native environment) is still using the tcl8 profile, as well as the glob command. The reason for this is that in those situations many applications won't expect exceptions when illegal byte-sequences happen in (disk-)filenames or in environment variables. That's why it's out-of-scope for this TIP. TIP #671 is an attempt to solve this problem with environment variables and the glob command.

Compatibility

Since this is an incompatible change whenever channels/files/sockets are used, it has a potential big effect on extensions. All extensions which could be confronted with encoding errors now have to handle the possibility of exceptions to be thrown in the case of encoding errors.

Also, when trying to open a file, when the filename has surrogate characters in it (or .. any code-point missing from the system encoding), opening such file will fail in Tcl 9.0, while it might have succeeded in Tcl 8.x. e.g.:

    set f [open \U1F91D w]
    close $f
    set f [open \uD83E\uDD1D r]
This will succeed in Tcl 8.7, but fail in Tcl 9.0, because surrogate pairs are not equal to the combined character any more.

The 'http' package is modified because of this change: Since the 'http' package is not prepared to handle exceptions, it can easily be left in an inconsistent state, as shown by test-case errors when the default profile was changed to 'strict'. Therefore, the 'http' package, when run in Tcl 9.0, will use the 'replace' profile. This makes the package conformant to the W3C recommendations.

The 'tcltest' package is modified to use the 'tcl8' profile for its internal channels. For this package, we don't want exceptions to disturb test-outputs. If a test-case wants to handle a surrogate, so be it, this should not disturb the testcase.

Implementation

Implementation is available in the tip-657 branch of the Tcl repository.

Copyright

This document has been placed in the public domain.