Author: Alexandre Ferrieux <firstname.lastname@example.org> State: Final Type: Project Vote: Done Created: 05-Feb-2009 Post-History: Discussions-To: Tcl Core List Keywords: Tcl,encoding,invalid UTF-8 Tcl-Version: 8.7
This TIP proposes to remove the 'identity' encoding which is the Pandora's Box of invalid UTF-8 string representations.
The contract of string representations in Tcl states that the bytes field (the strep) of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating it leads at best to inconsistent and shimmer-sensitive string comparisons. Fortunately, nearly all of the Tcl code takes careful steps to enforce it. With one exception: the 'identity' encoding. Indeed, this encoding allows any byte sequence to be copied verbatim into the strep of a value, as a side-effect of a strep computation on a ByteArray with [encoding system]=="identity", or through [encoding convertfrom identity]. Hence an invalid UTF-8 sequence can easily make it to the strep and start wreaking havoc.
This TIP proposes to simply close that single window to the dark side.
The risk of compatibility breakage is inordinately mild in that case, since it has only ever been documented in tcltest.
See Bug 2564363 https://sourceforge.net/support/tracker.php?aid=2564363
This document has been placed in the public domain.