Author: Jan Nijtmans <firstname.lastname@example.org> Author: Jan Nijtmans <email@example.com> Author: Don Porter <firstname.lastname@example.org> State: Draft Type: Project Vote: Pending Created: 23-Jan-2018 Post-History: Discussions-To: Tcl Core list Keywords: Tcl Tcl-Version: 9.0
This TIP proposes to add full support for all characters in Unicode 10.0+, inclusive the characters >= U+010000, even the adaptation in the regexp engine. Also, the caveats remaining from TIP #389 will be handled here.
This document proposes:
Add a new objType "UTF-32", which is able to store a string in 32-bits per character.
Adapt the regexp engine to start using the "UTF-32" objType: Any string handled by regexp will first be converted to "UTF-32".
Modify all API using Tcl_UniChar: If the string contains surrogate pairs, the "UTF-32" objType will used.
Modify all functions using or producing an index: "string length
" should return 1 for all Unicode characters, even the ones >= U+010000
TODO: everything else that comes up
A reference implementation is not started yet.
This document has been placed in the public domain.