Tcl Source Code

View Ticket
Login
Ticket UUID: 22324bcbdfdeedb511eed5a9447683ab96edfef4
Title: string reverse is broken in 8.6.11
Type: Bug Version: 8.6
Submitter: chw Created on: 2021-02-12 17:28:13
Subsystem: 44. UTF-8 Strings Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2021-02-16 12:16:45
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2021-02-16 12:16:45
Description:
set string \ud83d\udca3
-> 💣

binary encode hex [encoding convertto utf-8 $string]
-> f09f92a3

binary encode hex [encoding convertto utf-8 [string reverse $string]]
-> edb2a3eda0bd
User Comments: jan.nijtmans added on 2021-02-16 12:16:45:

Fixed now in core-8-6-branch and up. Thanks, Christian for the POC patch. There were some corner-cases to be handled, but it was a big help in getting this right!


jan.nijtmans added on 2021-02-14 17:26:23:

Thanks for the patch! I prefer to fix this in 8.7 first (it's still a bug there too), and after that reconsider for backport to 8.6.


chw added on 2021-02-13 05:16:49:
The attached stringrev.diff is a POC patch against a core-8-6-11 checkout.

chw added on 2021-02-12 17:59:23:
The problem here is that versions < 8.6.11 would result in

binary encode hex [encoding convertto utf-8 $string]
-> eda0bdedb2a3

i.e. despite being not proper UTF-8 at least the contract
of reversing the string is fulfilled, whereas 8.6.11
delivers a questionable result.

Yes, I think it should be fixed, but don't know how yet.

jan.nijtmans added on 2021-02-12 17:42:05:

Yep, this is a known bug, since "string reverse" reverses the surrogates resulting in invalid characters. It has been this way since 8.6.0 (most likely 8.5 as well), so it isn't a regression between 8.6.10/8.6.11 (therefore changed "Version" to "8.6". I think 8.7 still has the same bug as well. It's fixed in Tcl 9.0

Some people consider changing the result of existing commands in a patch release (see also [debd088e48]) a bug, so we should be careful when fixing this in 8.6.

Are you willing to supply a patch?


Attachments: