Ticket UUID: | 2c0505fd9dec2192a5a420c3314d17f6a64102d3 | |||
Title: | json::write::string escaping unicode that shouldn't be escaped | |||
Type: | Bug | Version: | Tcl 8.6.7 (json::write 1.0.3) | |
Submitter: | Jerry | Created on: | 2018-05-15 22:19:03 | |
Subsystem: | json :: write | Assigned To: | aku | |
Priority: | 5 Medium | Severity: | Important | |
Status: | Closed | Last Modified: | 2018-05-23 08:11:14 | |
Resolution: | By Design | Closed By: | Jerry | |
Closed on: | 2018-05-23 08:11:14 | |||
Description: |
In json::write 1.0.3, we have this (comments omitted) namespace eval ::json::write { variable indented 1 variable aligned 1 variable quotes \ [list "\"" "\\\"" \\ \\\\ \b \\b \f \\f \n \\n \r \\r \t \\t \ \x00 \\u0000 \x01 \\u0001 \x02 \\u0002 \x03 \\u0003 \ \x04 \\u0004 \x05 \\u0005 \x06 \\u0006 \x07 \\u0007 \ \x0b \\u000b \x0e \\u000e \x0f \\u000f \x10 \\u0010 \ \x11 \\u0011 \x12 \\u0012 \x13 \\u0013 \x14 \\u0014 \ \x15 \\u0015 \x16 \\u0016 \x17 \\u0017 \x18 \\u0018 \ \x19 \\u0019 \x1a \\u001a \x1b \\u001b \x1c \\u001c \ \x1d \\u001d \x1e \\u001e \x1f \\u001f \x7f \\u007f \ \x80 \\u0080 \x81 \\u0081 \x82 \\u0082 \x83 \\u0083 \ \x84 \\u0084 \x85 \\u0085 \x86 \\u0086 \x87 \\u0087 \ \x88 \\u0088 \x89 \\u0089 \x8a \\u008a \x8b \\u008b \ \x8c \\u008c \x8d \\u008d \x8e \\u008e \x8f \\u008f \ \x90 \\u0090 \x91 \\u0091 \x92 \\u0092 \x93 \\u0093 \ \x94 \\u0094 \x95 \\u0095 \x96 \\u0096 \x97 \\u0097 \ \x98 \\u0098 \x99 \\u0099 \x9a \\u009a \x9b \\u009b \ \x9c \\u009c \x9d \\u009d \x9e \\u009e \x9f \\u009f ] } I'm not 100% sure, but I believe the list should stop at \x1f, as per the specification here: https://www.rfc-editor.org/rfc/rfc7158.txt I stumbled on this while using ::json::write::string and getting escaped unicodes. % set s "\u30B8\u30A7\u30C3\u30EA" ジェッリ % set s [encoding convertto utf-8 $s] ã¸ã§ã㪠% ::json::write::string $s "ã\u0082¸ã\u0082§ã\u0083\u0083ã\u0083ª" # Should be "ã¸ã§ããª" Proposed fix: change quotes to the following: variable quotes \ [list "\"" "\\\"" \\ \\\\ \b \\b \f \\f \n \\n \r \\r \t \\t \ \x00 \\u0000 \x01 \\u0001 \x02 \\u0002 \x03 \\u0003 \ \x04 \\u0004 \x05 \\u0005 \x06 \\u0006 \x07 \\u0007 \ \x0b \\u000b \x0e \\u000e \x0f \\u000f \x10 \\u0010 \ \x11 \\u0011 \x12 \\u0012 \x13 \\u0013 \x14 \\u0014 \ \x15 \\u0015 \x16 \\u0016 \x17 \\u0017 \x18 \\u0018 \ \x19 \\u0019 \x1a \\u001a \x1b \\u001b \x1c \\u001c \ \x1d \\u001d \x1e \\u001e \x1f \\u001f ] | |||
User Comments: |
Jerry added on 2018-05-18 17:26:47:
Hi aku, Thanks for the reply! Yes, I was, but I'm a bit confused now. Is there then a reason as to why this list is escaping only this specific set of characters? Maybe I'm missing something. I guess I will have to handle those characters separately. aku added on 2018-05-17 20:13:50: Hi Jerry. Thank you for the report. Are you referring to the following sentence in the RFC ? ... characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F). ... If yes, note also the immediately following sentence: Any character may be escaped. |