Tcl Source Code

View Ticket
Login
Ticket UUID: f9539dce52e2dac82090760757763521b583efde
Title: UTF-8 Source code not parsed correctly
Type: RFE Version: 8.6.1 OSX 10.9.2
Submitter: samoc Created on: 2014-07-10 03:53:25
Subsystem: 44. UTF-8 Strings Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2021-03-18 14:52:22
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2021-03-18 14:52:22
Description:
If I write "puts 😞"
the output is "c3 b0 c2 9f c2 98 c2 9e  0a"
the correct output would be "f0 9f 98 9e 0a"

http://www.charbase.com/1f61e-unicode-disappointed-face
User Comments: jan.nijtmans added on 2021-03-18 14:52:22:

This is fixed in Tcl 8.6.10


jan.nijtmans added on 2014-07-10 14:27:52:

At this moment, Characters > 0xffff are not supported. Adding this is ongoing work, being done in implementing TIP 389.


jan.nijtmans added on 2014-07-10 14:27:28:

At this moment, Characters > 0xffff are not supported. Adding this is ongoing work, being done in implementing TIP 389.


samoc added on 2014-07-10 04:05:42:
It seems (after following the source code from Tcl_SourceObjCmd() all the way through to Tcl_UniCharToUtf()) that "#define TCL_UTF_MAX 3" in tcl.h is the problem.

At the very least there should either be very noticeable warnings in the documentation that say that "UTF-8 is not fully supported and your valid UTF-8 data may be silently corrupted" or there should be an error thrown when a valid UTF-8 character is encountered that is not supported.

Attachments: