Ticket UUID: | 3444754 | |||
Title: | string tolower \u01c5 is wrong | |||
Type: | Bug | Version: | None | |
Submitter: | msteveb | Created on: | 2011-11-29 03:42:34 | |
Subsystem: | 44. UTF-8 Strings | Assigned To: | nijtmans | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2011-12-08 02:53:57 | |
Resolution: | Fixed | Closed By: | nijtmans | |
Closed on: | 2011-12-07 19:53:57 | |||
Description: |
\u01c5 is the title case variant: Dž The lower case variant should be \u01c6 (dž), and this works for 8.5.8 but instead 8.5.11 and 8.6b2 give \u01c5 (.i.e unchanged). Here is the relevant entry from http://unicode.org/Public/UNIDATA/UnicodeData.txt 01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z HACEK;;01C4;01C6;01C5 | |||
User Comments: |
nijtmans added on 2011-12-08 02:53:57:
allow_comments - 1 Fix committed to all open branches, so it will appear in Tcl 8.5.12 and 8.6b3 nijtmans added on 2011-12-06 20:52:44: Here is the fix (see attached patch), just a single number 32931 should have been 32963 (line 754 of tclUniData.c). Will check that in soon, together with the updated uniParse.tcl which generates this correctly. nijtmans added on 2011-12-06 20:48:47: File Added - 430129: tclUniData.c.diff nijtmans added on 2011-12-05 21:32:05: Compare this UnicodeData.text line with the earlier entry in Unicode 2.x: 01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z HACEK;;01C4;01C6; So, the bug is introduced by a syntax change in the UnicodeData.txt file, not by any change at the Tcl side. The uniParse.tcl handles the line differently when the 'totitle' entry is filled. Other characters which changed the same way are \u01cb and \u01f2 (as mentioned by Steve), but many more..... OK, now I have all information needed to fix this.... nijtmans added on 2011-12-04 16:09:59: This bug is introduced earlier, at 2010-10-23 with the upgrade to Unicode 6.0 (Bug 3085863), it has no relation to 3393714 nijtmans added on 2011-11-29 18:19:55: Confirmed. Will have a look. dkf added on 2011-11-29 16:30:41: Probably related to the fix for 3393714. msteveb added on 2011-11-29 10:54:42: Ditto, \u01cb and \u01f2 |
Attachments:
- tclUniData.c.diff [download] added by nijtmans on 2011-12-06 20:48:47. [details]