Tcl Library Source Code

View Ticket
Login
Ticket UUID: 2c0505fd9dec2192a5a420c3314d17f6a64102d3
Title: json::write::string escaping unicode that shouldn't be escaped
Type: Bug Version: Tcl 8.6.7 (json::write 1.0.3)
Submitter: Jerry Created on: 2018-05-15 22:19:03
Subsystem: json :: write Assigned To: aku
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2018-05-23 08:11:14
Resolution: By Design Closed By: Jerry
    Closed on: 2018-05-23 08:11:14
Description:
In json::write 1.0.3, we have this (comments omitted)
    
    namespace eval ::json::write {
        variable indented 1
        variable aligned  1
    
        variable quotes \
    	[list "\"" "\\\"" \\ \\\\ \b \\b \f \\f \n \\n \r \\r \t \\t \
    	     \x00 \\u0000 \x01 \\u0001 \x02 \\u0002 \x03 \\u0003 \
    	     \x04 \\u0004 \x05 \\u0005 \x06 \\u0006 \x07 \\u0007 \
    	     \x0b \\u000b \x0e \\u000e \x0f \\u000f \x10 \\u0010 \
    	     \x11 \\u0011 \x12 \\u0012 \x13 \\u0013 \x14 \\u0014 \
    	     \x15 \\u0015 \x16 \\u0016 \x17 \\u0017 \x18 \\u0018 \
    	     \x19 \\u0019 \x1a \\u001a \x1b \\u001b \x1c \\u001c \
    	     \x1d \\u001d \x1e \\u001e \x1f \\u001f \x7f \\u007f \
    	     \x80 \\u0080 \x81 \\u0081 \x82 \\u0082 \x83 \\u0083 \
    	     \x84 \\u0084 \x85 \\u0085 \x86 \\u0086 \x87 \\u0087 \
    	     \x88 \\u0088 \x89 \\u0089 \x8a \\u008a \x8b \\u008b \
    	     \x8c \\u008c \x8d \\u008d \x8e \\u008e \x8f \\u008f \
    	     \x90 \\u0090 \x91 \\u0091 \x92 \\u0092 \x93 \\u0093 \
    	     \x94 \\u0094 \x95 \\u0095 \x96 \\u0096 \x97 \\u0097 \
    	     \x98 \\u0098 \x99 \\u0099 \x9a \\u009a \x9b \\u009b \
    	     \x9c \\u009c \x9d \\u009d \x9e \\u009e \x9f \\u009f ]
    }

I'm not 100% sure, but I believe the list should stop at \x1f, as per the specification here: https://www.rfc-editor.org/rfc/rfc7158.txt

I stumbled on this while using ::json::write::string and getting escaped unicodes.

    % set s "\u30B8\u30A7\u30C3\u30EA"
    ジェッリ
    % set s [encoding convertto utf-8 $s]
    ジェッリ
    % ::json::write::string $s
    "ã\u0082¸ã\u0082§ã\u0083\u0083ã\u0083ª"
    # Should be "ジェッリ"

Proposed fix: change quotes to the following:
    
    variable quotes \
      [list "\"" "\\\"" \\ \\\\ \b \\b \f \\f \n \\n \r \\r \t \\t \
       \x00 \\u0000 \x01 \\u0001 \x02 \\u0002 \x03 \\u0003 \
       \x04 \\u0004 \x05 \\u0005 \x06 \\u0006 \x07 \\u0007 \
       \x0b \\u000b \x0e \\u000e \x0f \\u000f \x10 \\u0010 \
       \x11 \\u0011 \x12 \\u0012 \x13 \\u0013 \x14 \\u0014 \
       \x15 \\u0015 \x16 \\u0016 \x17 \\u0017 \x18 \\u0018 \
       \x19 \\u0019 \x1a \\u001a \x1b \\u001b \x1c \\u001c \
       \x1d \\u001d \x1e \\u001e \x1f \\u001f ]
User Comments: Jerry added on 2018-05-18 17:26:47:
Hi aku,

Thanks for the reply!

Yes, I was, but I'm a bit confused now. Is there then a reason as to why this list is escaping only this specific set of characters?

Maybe I'm missing something.

I guess I will have to handle those characters separately.

aku added on 2018-05-17 20:13:50:

Hi Jerry. Thank you for the report.

Are you referring to the following sentence in the RFC ?

... characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F). ...

If yes, note also the immediately following sentence:

Any character may be escaped.