Tcl Source Code

All files named "generic/tclUtf.c"
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

History for generic/tclUtf.c

2024-04-23
16:06
Hmm, something is broken. Leaving this here but this commit is definitely wrong somehow file: [3b06de4a78] check-in: [b356268e32] user: dkf branch: tidy-indentation, size: 67824
2024-03-20
20:29
Fix [6811a00819]: lsearch performance degradation on Tcl 8.6.11 release (thanks, Sergey!) file: [5e9e721e61] check-in: [9b8a66aff6] user: jan.nijtmans branch: main, size: 67780
20:09
Fix [6811a00819]: lsearch performance degradation on Tcl 8.6.11 release (thanks, Sergey!) file: [e9392be0f4] check-in: [3186ba81a2] user: jan.nijtmans branch: core-8-branch, size: 76905
17:56
optimize TclUtfToUCS4 for single code units (non high surrogates), especially for ascii; fixes performance regression [6811a0081940b76c] file: [7e5292dabd] check-in: [ad35def80d] user: sebres branch: core-8-6-branch, size: 64804
2024-03-19
14:56
Fix indentation/brace usage style issues file: [f42d134067] check-in: [1fbff64078] user: dkf branch: main, size: 67790
2024-03-03
21:48
Merge 8.7 file: [4eb4e6aa0e] check-in: [123c58d051] user: jan.nijtmans branch: main, size: 67796
21:42
Merge 8.6 file: [398f88309e] check-in: [2490edbb83] user: jan.nijtmans branch: core-8-branch, size: 76938
21:30
Fix [d63061a1ac]: PRIVATE != CONTROL in Unicode file: [2096c6cc8b] check-in: [0480bdc823] user: jan.nijtmans branch: core-8-6-branch, size: 64805
2024-02-11
22:05
TIP #652 file: [d706e0f20e] check-in: [2e1b89ace3] user: jan.nijtmans branch: core-8-branch, size: 76959
2024-02-07
14:47
Implementation of TIP 652. file: [167a5a52d8] check-in: [f76bde9ba9] user: pooryorick branch: tip-652, size: 67817
2024-01-30
17:07
Tweaking indentation of code; really unimportant... file: [902d411682] check-in: [bb72806960] user: dkf branch: dkf-indent-tweak, size: 68630
2024-01-22
15:43
Merge 8.7 file: [b00de11567] check-in: [aae56979fd] user: jan.nijtmans branch: main, size: 68592
2024-01-21
16:31
Merge 8.6 file: [37b7a3481d] check-in: [9ebdb88b20] user: jan.nijtmans branch: core-8-branch, size: 77734
16:26
Optimize Tcl_UniCharIsControl(). Don't worry about range >= U+F0000, that's for TCL_UTF_MAX>3, which is unsupported for 8.6. file: [669f7ea6fc] check-in: [8e673f54d3] user: jan.nijtmans branch: core-8-6-branch, size: 64826
2024-01-15
10:46
Bug [d63061a1ac]: "PRIVATE != CONTROL in Unicode". Leave out "Co" (private-use characters) from "string is control". file: [e11c258454] check-in: [5219e3ff14] user: jan.nijtmans branch: bug-d63061a1ac, size: 77741
2024-01-12
13:12
Merge 8.7 file: [33bb3d3f9c] check-in: [7b36a6fd46] user: jan.nijtmans branch: main, size: 68691
11:53
Leave out Tcl_UtfNcmp/Tcl_UtfNcasecmp from -DTCL_NO_DEPRECATED builds, because it's part of the UTF16 compatibility layer file: [aa6e8b1e91] check-in: [2d72b799ff] user: jan.nijtmans branch: core-8-branch, size: 77833
2024-01-10
21:01
Fix [4e38c347a4] Changed contract for Tcl_UtfN(case)cmp in Tcl 8.7 file: [d1d67541a6] check-in: [45db2932ba] user: jan.nijtmans branch: core-8-branch, size: 77661
15:12
Fix [4e38c347a4]: Changed contract for Tcl_UtfN(case)cmp in Tcl 8.7 file: [7ad9a56545] check-in: [f4d16ada51] user: jan.nijtmans branch: main, size: 68681
12:41
TIP 685 implementation: rename "string is unicode" to "string is transferable". Also rename underlying C function "Tcl_UniCharIsUnicode" to "Tcl_UniCharIsTransferable". file: [74f7693140] check-in: [5018317bb2] user: oehhar branch: tip-685, size: 75746
2023-12-29
23:54
Fix [abd489a1c]: TclStringCmp() calls functions through pointer to incorrect type file: [8bad15be72] check-in: [c83fa73b65] user: jan.nijtmans branch: main, size: 66792
14:57
Unneeded #undef's. Testcase/comment cleanup file: [d2bc395e71] check-in: [18e32ec525] user: jan.nijtmans branch: main, size: 66702
12:39
Merge 8.6 file: [d88d36db90] check-in: [b307c14710] user: jan.nijtmans branch: core-8-branch, size: 75723
2023-10-13
13:27
Proposed fix for [abd489a1c]: TclStringCmp() calls functions through pointer to incorrect type. Modified, swapping the wrapping-order file: [03af0b6919] check-in: [6d9d5ac6a7] user: jan.nijtmans branch: bug-abd489a1c, size: 64925
2023-09-12
10:11
Merge 8.7 file: [c6ac1dcada] check-in: [d24c3420fa] user: jan.nijtmans branch: main, size: 66944
2023-05-01
20:28
More progress file: [80594eb5a9] check-in: [f3015fa249] user: jan.nijtmans branch: tip-665, size: 72531
19:42
Remove internal use of TCL_UTF_MAX=3 as much as possible, without compromizing existing TIP's file: [8fb9ac9c00] check-in: [86503e53c7] user: jan.nijtmans branch: tip-665, size: 72576
2023-04-28
17:34
Remove all code related to TCL_UTF_MAX=3. file: [35eabcd52d] check-in: [d0c75e3b88] user: pooryorick branch: unchained, size: 66945
06:43
Limit memset() to "TCL_UTF_MAX=3' builds. file: [76ecb3cdd4] check-in: [c72b11eac7] user: pooryorick branch: main, size: 71869
2023-04-27
20:34
memset(0xff) instead of memset(0) to accomodate tests that fill buffer with 0xff. file: [0c038d5268] check-in: [1e52f2b6c3] user: pooryorick branch: bug-f5eadcbf9a6b1b4c, size: 71815
2023-04-25
20:34
Fix for issue [f5eadcbf9a], passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corrupt data. file: [e85972715f] check-in: [db9f715fd5] user: pooryorick branch: bug-f5eadcbf9a6b1b4c, size: 71809
2023-04-14
15:01
Merge trunk file: [b1e70b23d0] check-in: [e36ab1e9ba] user: apnadkarni branch: tip-660, size: 71148
2023-04-12
14:25
Correct spelling errors in comments and documentation, but also a non-comment corrections in history.tcl and tcltest.test. file: [bee4197cdf] check-in: [d65da06a77] user: pooryorick branch: main, size: 71163
13:30
Correct spelling errors in comments and documentation, but also a non-comment corrections in history.tcl and tcltest.test. file: [784c46905b] check-in: [aca8de0aeb] user: pooryorick branch: core-8-branch, size: 75938
09:35
Correct spelling errors in comments and documentation, but also non-comment corrections in history.tcl and tcltest.test. file: [0c3f0782c0] check-in: [ee3df4e647] user: pooryorick branch: core-8-6-branch, size: 63167
2023-04-04
20:50
more progress file: [6e7ba5c478] check-in: [4e36871191] user: jan.nijtmans branch: optional-signed-size, size: 71215
2023-04-02
21:49
Merge 9.0. Add some more utility macro's file: [4954dba7d7] check-in: [4bedba476d] user: jan.nijtmans branch: optional-signed-size, size: 71147
2023-03-30
18:01
TIP 660. No compiler warnings. Tests suite pass on Win and Ubuntu file: [611210fab5] check-in: [eb81a25271] user: apnadkarni branch: tip-660, size: 71148
2023-03-22
19:30
Forgot one line in previous commit, and indenting file: [721bf4662a] check-in: [16b3efee0d] user: jan.nijtmans branch: main, size: 71163
18:13
Let's get in the 'readability' changes from the 'unchained' branch, without the need for all those partial merges. file: [73982f88a3] check-in: [3f420a07da] user: jan.nijtmans branch: main, size: 71044
18:00
Merge 8.7 file: [dbf3095366] check-in: [f4cf3c36b4] user: jan.nijtmans branch: main, size: 71112
17:57
Merge 8.6 file: [edad2ed8f4] check-in: [6063e17e15] user: jan.nijtmans branch: core-8-branch, size: 75938
17:50
Fix [0265750233]: invalid read in cmdAH-4.3.13.C1.solo.utf-8.tcl8 file: [d8c3dab8b5] check-in: [2ffcb8bcf4] user: jan.nijtmans branch: core-8-6-branch, size: 63167
17:34
Fix [0265750233]: invalid read in cmdAH-4.3.13.C1.solo.utf-8.tcl8. file: [0850778eea] check-in: [9f7e05419e] user: jan.nijtmans branch: mistake, size: 63167
16:36
Proposed fix for [0265750233]: invalid read in cmdAH-4.3.13.C1.solo.utf-8.tcl8. file: [63ea89e358] check-in: [4055888f8f] user: jan.nijtmans branch: bug-0265750233, size: 75916
2023-03-09
21:50
Merge 8.7 file: [9e14056004] check-in: [6c92d05c1f] user: jan.nijtmans branch: mistake, size: 71120
21:35
Additional fix for [f3cb2a32d6]: uninitialized value in format-2.18

This commit causes a lot of breackage .... file: [2cf67fb76e] check-in: [8d939a1c22] user: jan.nijtmans branch: mistake, size: 75928

2023-02-01
08:10
(cherry-pick) Make Tcl_UniCharToUtf more readable and add test to exercise surrogate handling. (test-case was still missing, which cannot be used in Tcl 8.6) file: [bac1ff6f63] check-in: [3b953fea77] user: jan.nijtmans branch: core-8-branch, size: 75898
07:32
(cherry-pick) Make Tcl_UniCharToUtf more readable and add test to exercise surrogate handling. file: [5b2e8be87b] check-in: [f4c704bc57] user: jan.nijtmans branch: core-8-branch, size: 76021
07:29
(Cherry-pick) Make Tcl_UniCharToUtf more readable. file: [0db541f666] check-in: [b4571ae045] user: jan.nijtmans branch: core-8-6-branch, size: 63147
00:34
Merge trunk. file: [019b30b09c] check-in: [e052e25beb] user: pooryorick branch: unchained, size: 71123
2023-01-31
23:04
Make Tcl_UniCharToUtf more readable and add test to exercise surrogate handling. file: [2b4eaf61a3] check-in: [fcdc24c850] user: pooryorick branch: main, size: 71072
22:15
Fix error introduced in [3e5e37f83b058f3d] for Tcl_UniCharToUtf, and add test. file: [4b1119fbfe] check-in: [df3390187d] user: pooryorick branch: py_easier_to_read, size: 71245
2023-01-30
19:37
Update code comments for Tcl_UniCharToUtf(). file: [0803bdaf9e] check-in: [aad66f6a63] user: pooryorick branch: unchained, size: 71205
11:59
A few more readability changes to Tcl_UniCharToUtf()

jn: Please, don't do that here. Tcl_UniCharToUtf() is shared between 8.6, 8.7 and 9.0. So if you want to make it easier to read, it should be done on all 3 branches. I know you only care about "trunk", but it makes maintenance on 8.6/8.7/9.0 harder than it already is. I don't want to spend time on reviewing such kind of changes, and no-one else is doing it. Thanks for understanding (I hope)! file: [8e2aa1cbcf] check-in: [b8524737fc] user: pooryorick branch: py_easier_to_read, size: 71246

11:22
Make Tcl_UniCharToUtf() a little easier to read. file: [87063883cc] check-in: [3e5e37f83b] user: pooryorick branch: py_easier_to_read, size: 71246
2022-11-16
20:52
one more (Tcl_UniCharToUtf), and adapt documentation file: [5d60e82b79] check-in: [89bee74fab] user: jan.nijtmans branch: main, size: 71201
20:39
Change 5 functions signatures from int -> size_t. Those should have been part of TIP #494 (Thanks, Gustaf!) file: [1b4b9d0cc8] check-in: [2571961f21] user: jan.nijtmans branch: main, size: 71198
2022-09-21
18:46
various break-fix measures file: [69d5c7f407] check-in: [eb68153185] user: bch branch: bch_sign_and_width, size: 71189
17:50
merge [trunk] file: [3402756a9b] check-in: [ab16261020] user: bch branch: bch_sign_and_width, size: 71191
2022-07-12
09:22
Merge 8.7 file: [363f78a471] check-in: [db6106323b] user: jan.nijtmans branch: main, size: 71183
09:03
Merge 8.7. Clean-up tclWinConsole.c the same way file: [191f492913] check-in: [da4d986fd1] user: jan.nijtmans branch: core-8-branch, size: 76027
08:20
Code cleanup (use {} in if/else statemenets) file: [cb8804bd23] check-in: [ba232277ee] user: jan.nijtmans branch: core-8-6-branch, size: 63153
2022-04-20
20:24
merge [trunk] file: [6e83bd1cbf] check-in: [3550542d19] user: bch branch: bch_sign_and_width, size: 71095
2022-04-08
09:37
Merge 9.0 file: [e9652741f4] check-in: [a1d061ebc8] user: jan.nijtmans branch: tip-619, size: 71191
2022-04-01
13:12
Merge 9.0 file: [05030a04f9] check-in: [4ce1a22a7f] user: jan.nijtmans branch: tip-619, size: 71336
10:29
Add UTF-16 versions of Tcl_GetCharLength/Tcl_GetRange/Tcl_GetUniChar to the stub table. Should have been part of TIP #542. Needed for Tk's "glyph_indexing_2" branch file: [163a4e6715] check-in: [9e1c0d7636] user: jan.nijtmans branch: main, size: 71087
2022-03-29
22:17
Merge 9.0. Fix CONTINUATION macro, and testcases file: [b76b499796] check-in: [38c85a4afd] user: jan.nijtmans branch: tip-619, size: 71295
2022-03-24
11:34
Implement PANIC when the UTF16 compatibility layer is used in combination with -DTCL_NO_DEPRECATED file: [bb16281435] check-in: [c9d9aaf618] user: jan.nijtmans branch: full-utf-for-87, size: 76035
10:48
When compiled with TCL_NO_DEPRECATED, remove the UTF16 compatibility layer. So, we make sure that it is never used internally for the Core. This means that extensions using the compatibility layer won't work any more in this mode; extensions should be compiled using TCL_UTF_MAX=4 then they work again. file: [220088a79b] check-in: [a68880dccd] user: jan.nijtmans branch: full-utf-for-87, size: 75936
2022-03-23
14:32
Fix Tcl_UniCharAtIndex() for UTF-16 compabitility layer file: [296a4ff68d] check-in: [01f2ee09b4] user: jan.nijtmans branch: full-utf-for-87, size: 75870
2022-03-22
23:18
Simplyfy Tcl_UtfAtIndex file: [83af4628be] check-in: [d395ff5ce6] user: jan.nijtmans branch: tip-622, size: 71046
15:38
Feature-complete file: [a481b61bb7] check-in: [d2ae10faca] user: jan.nijtmans branch: full-utf-for-87, size: 75894
11:25
More progress file: [1964d095c7] check-in: [8a7b81816e] user: jan.nijtmans branch: full-utf-for-87, size: 74396
08:16
Add UTF-16 versions of Tcl_UniCharLength/Tcl_NumUtfChars/Tcl_UtfAtIndex. Needed for Tk's glyph_indexing_2, and possibly other extensions sticking at TCL_UTF_MAX=3 file: [f7f920bff9] check-in: [d9384cad48] user: jan.nijtmans branch: tip-622, size: 71343
2022-03-16
23:08
Handle Tcl_UtfAtIndex file: [3e516c1808] check-in: [4d35f48f7c] user: jan.nijtmans branch: full-utf-for-87, size: 74374
2022-03-14
16:06
More progress file: [1faf708527] check-in: [48ab8b7472] user: jan.nijtmans branch: full-utf-for-87, size: 73884
2022-03-11
22:43
2 more functions file: [c549181e96] check-in: [f3580d9f09] user: jan.nijtmans branch: full-utf-for-87, size: 73755
22:11
Handle TclUniCharNcmp() file: [c920f06a63] check-in: [715b107f62] user: jan.nijtmans branch: full-utf-for-87, size: 69809
2022-03-08
22:26
initial work at using signed-storage for cases where values can actually be signed. file: [3b51e7483c] check-in: [5030ee02cf] user: bch branch: bch_sign_and_width, size: 69392
2022-03-03
13:05
TIP #619 implementation. tests not working yet file: [14dbaef4ca] check-in: [e791d2994c] user: jan.nijtmans branch: tip-619, size: 69633
2022-02-24
22:20
3 more files with TCL_UTF_MAX checks file: [2a18863905] check-in: [70eb0efb69] user: jan.nijtmans branch: core-8-branch, size: 69151
21:44
Merge 8.7 file: [2ff9e1f3ce] check-in: [0907ba571c] user: jan.nijtmans branch: main, size: 69384
2022-02-03
13:13
TIP #617: Tcl_WCharLen/Tcl_Char16Len file: [5fc32a6ed7] check-in: [2b0167c4a2] user: jan.nijtmans branch: tip-617, size: 69159
2021-04-30
08:49
Merge 8.7. Remove "string bytelength" completely. Also fix some TIP #595 leftover testcases, which were skipped file: [8f2b0ccaa6] check-in: [ef90dfbe67] user: jan.nijtmans branch: main, size: 68823
2021-03-17
12:04
Merge 8.7 file: [95bc7c89a5] check-in: [0d01d35ca0] user: jan.nijtmans branch: tip-597, size: 68595
2021-03-15
12:31
Merge 8.7 (this is the TIP #575 implementation for Tcl 9.0) file: [87a9e182f7] check-in: [3fed9809e1] user: jan.nijtmans branch: main, size: 68048
11:52
Implement TIP #575: Switchable Tcl_UtfCharComplete()/Tcl_UtfNext()/Tcl_UtfPrev() file: [f559d46094] check-in: [4abf5fb992] user: jan.nijtmans branch: core-8-branch, size: 67820
2021-03-11
12:19
Backport Tcl_UtfCharComplete() functionality from 8.6 for TCL_UTF_MAX>3. This makes Tcl_UtfCharComplete() usable to protect Tcl_UtfNext() calls for overflow. No change for TCL_UTF_MAX=3 (default build) file: [0225383322] check-in: [0f1dccacba] user: jan.nijtmans branch: core-8-5-branch, size: 55274
2021-03-10
16:17
Merge 8.7 file: [2ecfd1c249] check-in: [3c64a79d6a] user: jan.nijtmans branch: main, size: 68564
16:16
Merge 8.6 file: [309a58a3e8] check-in: [9491870854] user: jan.nijtmans branch: core-8-branch, size: 68336
16:12
Repair Tcl_UniCharNcasecmp() in the same way as Tcl_UniCharNcmp() for fix [4c591fa487]. Also put back minor optimization for big-endian machines removed in the previous commit file: [7d9aff2106] check-in: [1b0eda88cc] user: jan.nijtmans branch: core-8-6-branch, size: 63157
15:48
Merge 8.7 file: [90ae9091cd] check-in: [62a4501023] user: jan.nijtmans branch: main, size: 68085
15:47
Merge 8.6 file: [dbc23d8625] check-in: [bcaebc9bb7] user: jan.nijtmans branch: core-8-branch, size: 67857
15:39
Fix [4c591fa487]: [string compare] EIAS violation file: [9f093c66fb] check-in: [6a5a5e21f1] user: jan.nijtmans branch: core-8-6-branch, size: 62676
12:55
TIP #597 implementation: "string is unicode" and new wtf-8 encoding file: [37c46c7c61] check-in: [fc3656894a] user: jan.nijtmans branch: tip-597, size: 68625
2021-03-03
14:31
Backport improvements in UTF-8 handling for Tcl_UtfPrev/Tcl_UtfNext from 8.7 (through 8.6). No change for TCL_UTF_MAX=3. Adapt test-cases accordingly file: [25aa4d36c7] check-in: [4c185260b8] user: jan.nijtmans branch: core-8-5-branch, size: 54494
2021-03-02
11:02
Merge 8.7 file: [bec813293d] check-in: [981b059a3f] user: jan.nijtmans branch: main, size: 68078
10:53
Using 0xFC00 is more readable here than ~0x3FF. It's sufficient becauwe ch1 and ch2 are only 16-bit. Backported from 8.7 file: [1a3a67aab3] check-in: [6064dfa0b2] user: jan.nijtmans branch: core-8-6-branch, size: 62668
10:38
Merge 8.6 file: [e34c706dc3] check-in: [b421cf94ac] user: jan.nijtmans branch: core-8-branch, size: 67850
10:10
Backport some UTF-8-related changed from 8.7 to 8.6, only for TCL_UTF_MAX > 3. No change for TCL_UTF_MAX=3. Also adapt test-cases accordingly, and add comments why the changes were done. file: [85fb86571b] check-in: [e0cba87ba8] user: jan.nijtmans branch: core-8-6-branch, size: 62668
2021-02-16
08:10
Fix Tcl_UtfPrev for TCL_UTF_MAX>3, so it can jump back over Emoji. Backported from 8.7, no change for TCL_UTF_MAX=3. This way, the previous fix can be slightly more simplified, and working for TCL_UTF_MAX>3 too. file: [b7dbe89f73] check-in: [5437791423] user: jan.nijtmans branch: core-8-6-branch, size: 62083
2020-12-13
16:59
Merge 8.7 file: [d677f79a41] check-in: [0631cdce81] user: jan.nijtmans branch: tip-575, size: 67079
2020-12-08
15:42
Merge 8.7 file: [c8ad303c5a] check-in: [71493b571b] user: jan.nijtmans branch: main, size: 67689
15:31
Add -finput-charset=UTF-8 and -fextended-identifiers to gcc (and clang). All C sources can now use UTF-8, as far as gcc/clang/msvc support it. Not used yet file: [b10b993d1b] check-in: [4254aa305b] user: jan.nijtmans branch: core-8-branch, size: 67461
2020-11-18
14:29
Merge 8.7 file: [1ddca5273e] check-in: [0e52f34ca2] user: jan.nijtmans branch: main, size: 67690
13:51
More usage of TclUtfToUCS4/TclUniCharToUCS4 in stead of it's UniChar variants: This handles surrogate pairs better. file: [be48f7aa0c] check-in: [d843858583] user: jan.nijtmans branch: core-8-branch, size: 67462
2020-11-05
17:06
Merge 8.7 file: [4714ced6aa] check-in: [23c0c45dd0] user: jan.nijtmans branch: tip-575, size: 67080
2020-06-04
15:15
Merge 8.7. Use more TCL_INDEX_NONE in documentation/headers/code. file: [5acf707511] check-in: [baad702302] user: jan.nijtmans branch: trunk, size: 67436
2020-05-25
11:53
Finish implementation of "string nextchar|nextword|prevchar|prevword". Not thourougly test yet, but seems OK at first sight. file: [aebc1a920f] check-in: [0a46907d56] user: jan.nijtmans branch: tip-575, size: 67687
09:46
Merge 8.7 file: [a394c3ef5f] check-in: [383de70ed5] user: jan.nijtmans branch: trunk, size: 67442
09:32
Fix compiled "string is <class>" for characters > U+FFFF. Add testcase exposing this bug. file: [1279a196e3] check-in: [2a3709ca18] user: jan.nijtmans branch: core-8-branch, size: 67208
09:02
Fix compiled "string is <class>" for TCL_UTF_MAX=4 build, for characters > U+FFFF. file: [935c0f05f5] check-in: [1eec2e52c3] user: jan.nijtmans branch: core-8-6-branch, size: 62059
2020-05-22
21:28
Split more "string" functions. New helper function TclUniCharToUCS4(), not used yet but that's the next step. file: [1c57261714] check-in: [ec03d6f62d] user: jan.nijtmans branch: tip-575, size: 67433
14:36
Merge 8.7 Add function Tcl_UniCharFold(). It's the same as Tcl_UniCharToLower() for now, but that will change. file: [9b9a0643d1] check-in: [c338858460] user: jan.nijtmans branch: tip-575, size: 67050
2020-05-20
19:12
Merge 8.7 file: [d5326119f4] check-in: [4d5fc5a4f3] user: jan.nijtmans branch: trunk, size: 67059
19:09
Adapt some comments, which are not correct for Tcl 8.7 any more file: [daab2dad3d] check-in: [1203d7b979] user: jan.nijtmans branch: core-8-branch, size: 66825
2020-05-18
20:23
Tiny fix for TCL_UTF_MAX=4 build only: Since Tcl_UtfNext() verifies 4 bytes for lead bytes F0-F5, Tcl_UtfCharComplete() should guarantee that those 4 bytes are available, not 3. file: [2fd4a1f9d9] check-in: [f299c3c546] user: jan.nijtmans branch: core-8-6-branch, size: 61648
11:51
Merge 8.7 file: [989f7f2b9d] check-in: [c933eb1f2d] user: jan.nijtmans branch: tip-573, size: 62334
10:10
Adapt Tcl_UtfPrev()/Tcl_UtfNext() to be consistant with Tcl_UtfToUniChar() file: [7ae9dd89ae] check-in: [6aa676e6bc] user: jan.nijtmans branch: tip-573, size: 61033
2020-05-13
22:07
Merge 8.7. Further progress with TIP implementation. file: [5f36d4015f] check-in: [afe207f116] user: jan.nijtmans branch: tip-575, size: 66452
16:51
Fix [ed29806baf]: Tcl_UtfToUniChar reads more than TCL_UTF_MAX bytes file: [163cc43ab2] check-in: [2a7beb9d18] user: jan.nijtmans branch: core-8-6-branch, size: 61648
2020-05-12
21:22
Merge 8.7 file: [0f33a04699] check-in: [ee9f526818] user: jan.nijtmans branch: trunk, size: 67067
21:17
Little tweak to Tcl_UniCharAtIndex(): Protect against negative index, return -1 in that case. file: [c0daf522a8] check-in: [fba6de9b22] user: jan.nijtmans branch: core-8-branch, size: 66833
19:47
Merge 8.7 file: [9cc185dddb] check-in: [d513c9b2fd] user: jan.nijtmans branch: trunk, size: 67059
19:41
Merge 8.6 file: [5fc434c5c5] check-in: [1d587617b0] user: jan.nijtmans branch: core-8-branch, size: 66794
19:20
Revert implementation of Tcl_UniCharAtIndex() change done in this commit: [6596c4af31e29b5d]. Just look at the Tcl_UtfAtIndex() implementation for TCL_UTF_MAX=4: It's not the same. There are no test-cases for Tcl_UniCharAtIndex(), see [f45d0dc1a7], not really worth to write one, since the implementation of this function didn't change in 20 years. file: [c08942135d] check-in: [0418b67de2] user: jan.nijtmans branch: core-8-6-branch, size: 62224
18:59
First, experimental implementation of TIP #575. Barely tested, will fail. WIP file: [e28fe4a740] check-in: [d15b9cb99e] user: jan.nijtmans branch: tip-575, size: 66518
11:08
Merge testcase cleanup. Make Tcl_UtfPrev() behave the same for any TCL_UTF_MAX value, since we didn't figure out yet how it should behave for TCL_UTF_MAX>3. file: [c2b8a025c1] check-in: [b5d4bf440b] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 61613
07:26
Fix "knownBug" utf-4.11. Turns out a few other testcases where still not correct, now they are. Make next/prev behavior the same for all TCL_UTF_MAX values, since the exact behavior for TCL_UTF_MAX>3 should be worked out further for Tcl 8.7 first, then everything agreed upon can be backported. file: [6bc82beea3] check-in: [8aa5fcb56e] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 61680
2020-05-11
12:25
Merge 8.6. Mark testcase utf-4.11 as "knownBug": this one still doesn't give the right answer. Add testcase 4.14 with similar corner-case, this one is OK. file: [bff485763b] check-in: [aeb46f9ed0] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 62194
11:26
Merge 8.7 file: [6fd45d6640] check-in: [89e2216acf] user: jan.nijtmans branch: trunk, size: 66804
10:18
Merge 8.6 file: [0fec4430b8] check-in: [097c064eb4] user: jan.nijtmans branch: core-8-branch, size: 66580
10:03
Tweak the Tcl_UtfPrev() implementation for TCL_UTF_MAX=4. This fixes 10 testcases in 4 groups (utf-7.10, utf-7.15, utf-7.40 and utf-7.48) , where Tcl_UtfPrev() didn't jump to the beginning of the UTF-8 character, even though there was no limitation which prevented that. So, this is actually a bug-fix for the TIP #389 implementation. file: [a4178ad1d0] check-in: [ae6c7e8b86] user: jan.nijtmans branch: core-8-6-branch, size: 62210
07:42
Merge 8.7 file: [3ceb6a9145] check-in: [fe9bc8500a] user: jan.nijtmans branch: trunk, size: 66748
07:41
Merge 8.6 file: [9fed91ce86] check-in: [6a7f9c3f67] user: jan.nijtmans branch: core-8-branch, size: 66524
07:39
occurance -> occurrence. file: [0834b8cf55] check-in: [1f06b263bc] user: jan.nijtmans branch: core-8-6-branch, size: 62212
2020-05-10
20:58
Demonstration for documentation bug, and suggestion for improved wording. More explanation will follow in the ticket. file: [194cf0d34f] check-in: [94b8ef9338] user: jan.nijtmans branch: bug-81242a48c8, size: 66293
20:15
Merge 8.7 file: [1b799deb71] check-in: [80b28f70b4] user: jan.nijtmans branch: trunk, size: 66747
20:11
Merge 8.6 file: [7f66271ea1] check-in: [2cff341b8c] user: jan.nijtmans branch: core-8-branch, size: 66523
19:28
Tweak Invalid() function: No need for "return 0" twice in the function. For start bytes F0-F4, case TCL_UTF_MAX=4, Tcl_UtfToUniChar() reads 3 bytes but only advances 1 byte. So Tcl_UtfCharComplete() must make sure 3 bytes are available, not 1. Adapt Tcl_UtfCharComplete() accordingly. No change for TCL_UTF_MAX=[3|6] file: [c5862ed2d3] check-in: [1c924d98e0] user: jan.nijtmans branch: core-8-6-branch, size: 62211
2020-05-08
15:19
Rebase to latest core-8-6-branch. file: [1102147264] check-in: [03e4d0a22a] user: jan.nijtmans branch: bug-31aa44375d, size: 61282
2020-05-07
22:06
merge 8.7 file: [9b16395cee] check-in: [b7293ef281] user: dgp branch: trunk, size: 66759
21:59
Merge 8.6 file: [5f74755db2] check-in: [6f4a6b90ef] user: dgp branch: core-8-branch, size: 66535
21:47
Merge changes from parent branch file: [467388c349] check-in: [2742b2b00d] user: dgp branch: bug-31aa44375d, size: 61305
21:28
Merge 8.6 file: [d282366dbc] check-in: [c59e41ca4e] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 61589
20:44
merge 8.5 file: [f2c0d97c09] check-in: [46d4cccefe] user: dgp branch: core-8-6-branch, size: 61720
20:31
Same trouble with Tcl_UtfToUniCharDstring. Test and fix. file: [6406b6da69] check-in: [1a4edbc67e] user: dgp branch: core-8-5-branch, size: 54068
19:22
merge 8.5 file: [a7f7399b26] check-in: [488d1d5841] user: dgp branch: core-8-6-branch, size: 61723
19:08
Fix. Note that just because we get one positive detection of an incomplete character, we cannot conclude that the next byte also will be, or can by taken as a single byte. At least we cannot when TCL_UTF_MAX > 3 so that we have room for valid two-byte sequences after incomplete sequence detection. No need for conditional code, just use an algorithm that always works. file: [633872436b] check-in: [899e66a3c0] user: dgp branch: bug-b2816a3afe, size: 54075
18:23
New approach to fixing the regression reported in [31aa44375d] builds on recent reforms. Older efforts aborted. file: [eeafc020ea] check-in: [ad31bd7310] user: dgp branch: bug-31aa44375d, size: 61319
14:15
For TCL_UTF_MAX==4: Make sure that Tcl_UtfNext()/Tcl_UtfPrev() never move more than 3 bytes. This is more consistant with what Tcl 8.7 does too. For TCL_UTF_MAX==6: Make sure that Tcl_UtfNext()/Tcl_UtfPrev() never move more than 4 bytes. For TCL_UTF_MAX==3: No change. Introduce ucs2_utf16 test constraint, since many test results now become the same for ucs2 and utf16. file: [4c78350590] check-in: [116d4a8943] user: jan.nijtmans branch: core-8-6-branch, size: 61728
13:37
Merge 8.6. Some more tweaks to Tcl_UtfPrev(), so it cannot jump back 4 bytes in "utf16" build any more. file: [9d407eb301] check-in: [6f4bdb20da] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 61597
11:09
Merge 8.7 file: [32dd82e7e4] check-in: [3516a881cf] user: jan.nijtmans branch: trunk, size: 66773
10:56
Merge 8.6 file: [80c3a3b4a5] check-in: [2de70b5bd1] user: jan.nijtmans branch: core-8-branch, size: 66549
10:09
Optimize Tcl_UtfToUniCharDString() file: [73e3d5ab08] check-in: [806e1e868c] user: jan.nijtmans branch: core-8-6-branch, size: 61725
09:31
Tighten optimization in Tcl_UtfToUniCharDString(), just as in Tcl_NumUtfChars(). Don't use "-1" in the Tcl_NumUtfChars() calculation, since that raises more questions than it solves, but that's easy to be remedied as well: Juse use >= in stead of > in the comparation. Great idea, Don! Backport more code formatting from Tcl 8.6 (e.g. use of CONST, which makes no sense any more in c-files) file: [d9de658c5f] check-in: [49fb3b2f1a] user: jan.nijtmans branch: core-8-5-branch, size: 54086
2020-05-06
21:59
merge 8.7 file: [547abbcb04] check-in: [f21991ce1e] user: dgp branch: trunk, size: 66348
21:52
merge 8.6 file: [8ca0195fe0] check-in: [4d08cde908] user: dgp branch: core-8-branch, size: 66124
21:42
merge 8.5 file: [9aa5c57149] check-in: [62362d0caa] user: dgp branch: core-8-6-branch, size: 61525
21:08
Tighten optimization in Tcl_NumUtfChars. Explain in comments. file: [6ec6093330] check-in: [dabb52db36] user: dgp branch: core-8-5-branch, size: 54053
19:55
merge 8.7 file: [f36b3644ff] check-in: [541cffe991] user: dgp branch: trunk, size: 65736
19:48
merge 8.6 file: [85ae6af27c] check-in: [bf737b27ba] user: dgp branch: core-8-branch, size: 65547
19:31
merge 8.5 file: [cafc8c7a54] check-in: [01956c0799] user: dgp branch: core-8-6-branch, size: 60948
19:22
Restore safe calls of Invalid(). file: [fd1606ce1a] check-in: [8d0f9fd43b] user: dgp branch: core-8-5-branch, size: 53477
16:58
The routine Invalid() has been revised to do something different. Update the comments to describe what it does now, and cautions that callers take into account. file: [2f13e3001c] check-in: [1835c80d8f] user: dgp branch: core-8-5-branch, size: 53144
13:31
Merge 8.7 file: [b4ca240e7a] check-in: [c990c1d146] user: jan.nijtmans branch: trunk, size: 64980
13:22
Merge 8.6 file: [0103a8bc49] check-in: [918cfd8094] user: jan.nijtmans branch: core-8-branch, size: 64791
13:14
Merge 8.5. More usage of UCHAR() macro. file: [828ebbe4fc] check-in: [3f3c0fda44] user: jan.nijtmans branch: core-8-6-branch, size: 60192
13:03
Change Invalid() parameter type to "const char *". Also call Invalid() first in Tcl_UtfNext(), so if src[1] is invalid src[2] doesn't need to be checked any more.

Note: This order change, calling Invalid() first was wrong, and is corrected in later commits. Thanks, Don, for noticing this! file: [2ffd834b54] check-in: [31c95595b2] user: jan.nijtmans branch: core-8-5-branch, size: 52721

09:44
Merge 8.7 file: [ee720bf978] check-in: [97679b4f1d] user: jan.nijtmans branch: tip-573, size: 64914
2020-05-05
16:00
More usage of TclUtfToUCS4(), so we can use the whole Unicode range better in TCL_UTF_MAX>3 builds. file: [ccbd485d3f] check-in: [26e57ca148] user: jan.nijtmans branch: core-8-6-branch, size: 60187
13:23
Add 4 test-cases that could fool Tcl_UtfPrev (but ... actually they don't). Make sure that Tcl_UtfPrev() never reads more than 3 trail bytes (or 4 when TCL_UTF_MAX > 4). Those are the same limits as for Tcl_UtfNext() and Tcl_UtfToUniChar() file: [4f3603d99e] check-in: [50e98246d7] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 60100
11:54
Merge 8.7 file: [c5ed1ee11f] check-in: [95e9950dad] user: jan.nijtmans branch: trunk, size: 64975
11:44
Fix Tcl_UtfPrev() such that it can never go back more than TCL_UTF_MAX bytes. Already done correctly on core-8-6-branch, but this was never forwarded to core-8-branch. file: [9315ba8c93] check-in: [21adba4503] user: jan.nijtmans branch: core-8-branch, size: 64786
10:06
Merge 8.6 file: [97b7eecfe0] check-in: [5d3edce6f4] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 60084
09:51
Merge 8.7 file: [e11eca8c98] check-in: [bd1dcc78c1] user: jan.nijtmans branch: trunk, size: 64882
08:16
Merge 8.6 file: [de2ccc2bf4] check-in: [7d127f6d27] user: jan.nijtmans branch: core-8-branch, size: 64693
07:39
Merge 8.5 file: [98970568cf] check-in: [822925b9b4] user: jan.nijtmans branch: core-8-6-branch, size: 60215
07:29
Properly protect "Invalid" function against lead bytes 0x80-0xBF. This fixes "knownBug" testcase utf-6.93.1. Rename tip389 selector to utf16, since that's what it actually is, in contrast to ucs2 and ucs4. file: [993152b84c] check-in: [b0b773f640] user: jan.nijtmans branch: core-8-5-branch, size: 52716
2020-05-04
14:17
More progress/simplification file: [e972014e95] check-in: [a17d905779] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 60105
13:01
Merge 8.6 file: [60207ec948] check-in: [1256ace951] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 60870
12:31
New internal function TclGetUCS4() only available when TCL_UTF_MAX=4. This fixes all "knownBug" testcases related to tip389. file: [2283dbc19a] check-in: [41517f0841] user: jan.nijtmans branch: core-8-6-branch, size: 60226
11:08
Merge 8.6 file: [05701644db] check-in: [76cc0911f3] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 60847
08:35
(partial) fix for [9d0cb35bb2]: Various issues with core-8-6-branch, TCL_UTF_MAX=4. (even though TCL_UTF_MAX=4 is unsupported, it would be nice to make it work) Marked various test-cases as "knownBug", those work correctly in core-8-branch (8.7). The fix there could be backported. Low prio. file: [fc6d7ce227] check-in: [af513d6a16] user: jan.nijtmans branch: core-8-6-branch, size: 60073
2020-05-03
22:27
Merge 8.6 file: [ab4526735f] check-in: [0de2fe18d1] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 60805
22:16
Re-join utf-6.93.0 and utf-6.93.1 (please disregard comment in previous commit, it was not correct). Perfectionalize TclUtfToUCS4()/TclUCS4Complete() and new (internal) function TclUCS4ToUtf(). They can help preventing bugs regarding splitting/joining surrogates. Used them in a few more places. file: [aa796ca523] check-in: [161196f054] user: jan.nijtmans branch: core-8-6-branch, size: 60121
2020-05-02
22:48
Join test-cases utf-6.93.0 and utf-6.93.1, which MUST give the same answer always for whatever testConstraints. Fix one invalid use of TclUCS4Complete(), and let TclUtfToUCS4() handle (invalid) 4-byte sequences. Test-case cleanup (removal of unnecessary quoting) file: [b74d0eff81] check-in: [e145652a1f] user: jan.nijtmans branch: core-8-6-branch, size: 58821
21:54
Seems almost correct. Still problem with "string index" for TCL_UTF_MAX>3 file: [21368c5c14] check-in: [20993bd6c0] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 59505
10:15
More fixes for [ed29806baf]. Not working yet. WIP file: [09e78bf6c7] check-in: [f670d1a41f] user: jan.nijtmans branch: bug-ed29806baf-8.6, size: 59677
2020-05-01
15:29
Rebase to core-8-6-branch (I don't think it's solved yet, but let's see) file: [fa452f53e1] check-in: [04631bfc72] user: jan.nijtmans branch: bug-ed29806baf, size: 58569
14:44
Merge 8.7 file: [e1d0eb44d9] check-in: [056c1aad28] user: jan.nijtmans branch: trunk, size: 64893
14:20
Fix first part of [ed29806baf]: Tcl_UtfToUniChar reads more than TCL_UTF_MAX bytes. Tcl_UtfToUniChar() now never reads more than TCL_UTF_MAX bytes any more. Since the UtfToUtf encoder/decoder now uses TclUtfToUCS4() it doesn't join 2 surrogates as 2 x 3-byte sequences any more. Actually, it shouldn't, because such sequences are invalid UTF-8. Therefore, added the ucs2 constraint to testcase encoding-15.4. Let's see how TIP #573 goes, this TIP should make this change official. Other callers of Tcl_UtfToUniChar() needs to be revised for the same problem. Most callers will need to change Tcl_UtfToUniChar() -> TclUtfToUCS4() and Tcl_UtfCharComplete() -> TclUCS4Complete(), but that's not done yet. file: [016c050892] check-in: [1d9487bc7e] user: jan.nijtmans branch: core-8-branch, size: 64704
13:38
Fix first part of [ed29806baf]: Tcl_UtfToUniChar reads more than TCL_UTF_MAX bytes. Tcl_UtfToUniChar() now never reads more than TCL_UTF_MAX bytes any more. The UtfToUtf encoder/decoder is adapted to do attitional checks (more tricky than in Tcl 8.7, since we want compatibility with earlier 8.6 releases). Other callers of Tcl_UtfToUniChar() needs to be revised for the same problem. Most callers will need to change Tcl_UtfToUniChar() -> TclUtfToUCS4() and Tcl_UtfCharComplete() -> TclUCS4Complete(), but that's not done yet. file: [f5fdfb6efc] check-in: [5f2bb912cf] user: jan.nijtmans branch: core-8-6-branch, size: 58821
2020-04-30
21:23
merge 8.7 file: [3f6b7394c0] check-in: [25e735d05d] user: dgp branch: bug-e617e8a71a, size: 68058
19:25
merge 8.7 file: [70acfc1299] check-in: [948dd75b36] user: dgp branch: trunk, size: 64940
19:24
merge 8.6 file: [f60b307a2e] check-in: [c1648c2e63] user: dgp branch: core-8-branch, size: 64751
18:58
merge 8.5 file: [6f43f892c5] check-in: [c4bcfe94fe] user: dgp branch: core-8-6-branch, size: 58884
17:21
Work In Progress. Much trickiness about sorting out expectations. file: [74a6d3dafe] check-in: [581d6b939a] user: dgp branch: bug-e617e8a71a, size: 67780
15:21
Add comments so I'll know again later why this is here. file: [c0c5a180c4] check-in: [de47f8d8ee] user: dgp branch: core-8-5-branch, size: 52727
14:54
First, prove that bug [ed29806baf] is present in 8.7 too. Let's see what test-cases fail when we no longer check the validity of the 3th trail byte. file: [e043fcdbc0] check-in: [5510c6045d] user: jan.nijtmans branch: bug-ed29806baf-8.7, size: 64539
14:20
Merge 8.7 file: [25e75d1724] check-in: [a65a5fe9ec] user: jan.nijtmans branch: trunk, size: 64767
14:16
Backout the quick-fix workaround in Tcl_NumUtfChars as a way to monitor progress on this issue. file: [3b2d18dec3] check-in: [b8aafb8d10] user: dgp branch: bug-ed29806baf, size: 58443
12:51
Let's not get out the src[3] check yet. file: [bb95bef2f4] check-in: [577f5c5de8] user: jan.nijtmans branch: core-8-branch, size: 64578
12:45
Merge 8.6 file: [bd04a59c24] check-in: [7ce14c8063] user: jan.nijtmans branch: core-8-branch, size: 64582
12:09
Partial fix for [ed29806ba]: Tcl_UtfToUniChar reads more than TCL_UTF_MAX bytes. file: [4907cb1886] check-in: [98f5334275] user: jan.nijtmans branch: core-8-6-branch, size: 58711
2020-04-29
22:22
merge 8.7 file: [054d4aae77] check-in: [7db8756066] user: dgp branch: bug-e617e8a71a, size: 66345
20:44
Backport many UNICODE_OUT_OF_RANGE() calls. This should fix [69634d51fb74551b] for Tcl 8.5 (with TCL_UTF_MAX=4) too. Also fix some comments which were not up to date. No change at all in behavior for TCL_UTF_MAX=3. file: [16c3f0784c] check-in: [fe05235530] user: jan.nijtmans branch: core-8-5-branch, size: 52554
19:23
Merge 8.7 file: [a4101aad1d] check-in: [9f5d7846b9] user: jan.nijtmans branch: trunk, size: 64662
19:20
Merge-mark 8.6 (Use of UNICODE_OUT_OF_RANGE() macro already was in 8.7). Quick exit from Tcl_UtfToChar16()/Tcl_UtfToUniChar() when lead-byte is 0xF5 - 0xF7. file: [f06fe486d5] check-in: [24f0963165] user: jan.nijtmans branch: core-8-branch, size: 64473
19:00
Add UNICODE_OUT_OF_RANGE() calls to UCS4ToTitle() and friends. Backported from 8.7. This fixes [69634d51fb]: handling out of range UCS4 values (at least, it's fixed in 8.6 now the same way as it's fixed in 8.7). file: [09fa94833f] check-in: [8daf5a10dd] user: jan.nijtmans branch: core-8-6-branch, size: 58601
17:39
First attempt at extending routine to deal with surrogate pairs. Exposes problems with the interface. file: [c223a30f70] check-in: [f6001c20fb] user: dgp branch: bug-e617e8a71a, size: 66436
2020-04-28
21:45
Merge 8.7 file: [0d52a887c6] check-in: [b11d56197b] user: jan.nijtmans branch: rfe-f443140a85, size: 63972
14:37
Implementation for TIP #573: Surrogates are invalid file: [9457d8b6d7] check-in: [749d917ed5] user: jan.nijtmans branch: tip-573, size: 64692
09:46
Change test expectations to what desired. Mark failing tests with "knownBug". 10 test-cases are affected. Still work do to .... file: [aaa764e1d8] check-in: [4641000dd1] user: jan.nijtmans branch: rfe-f443140a85, size: 63975
2020-04-27
20:49
First shot at implementatio for [f443140a85]. Far from correct yet, since Tcl_UtfPrev() gives strange results. Tcl_UtfNext() looks OK at first sight. Further testing needed. file: [765a2c48f8] check-in: [b287415a79] user: jan.nijtmans branch: rfe-f443140a85, size: 63920
12:59
merge 8.7 file: [33da1294ac] check-in: [94ed8d8b2b] user: dgp branch: trunk, size: 64753
12:57
merge 8.6 file: [1f829b2e0d] check-in: [60de0f0a95] user: dgp branch: core-8-branch, size: 64564
12:26
Use lossless internal routines to cover extended characters. file: [652d56928a] check-in: [ef84b2a306] user: dgp branch: bug-45ca2338cd, size: 58446
01:29
Possible fix for [string to*] writing out a high surrogate at end of string. file: [f35edc2a16] check-in: [cd98f1d62d] user: dgp branch: bug-45ca2338cd, size: 58102
2020-04-26
15:21
Merge 8.7 file: [8b59a487f5] check-in: [631a21f090] user: jan.nijtmans branch: trunk, size: 65642
15:11
Cherry-pick Tcl_UniCharAtIndex() implementation from [6596c4af31], but adapted to the needs of TIPs 389/542. file: [792691dbab] check-in: [25b3737625] user: jan.nijtmans branch: core-8-branch, size: 65453
13:25
Remove the function Tcl_UniCharAtIndex() completely from the core. Meant as a demonstration for ticket [f45d0dc1a7]. file: [30c86e86e2] check-in: [90be2283d0] user: jan.nijtmans branch: bug-f45d0dc1a7, size: 58038
2020-04-25
22:50
Merge 8.7 file: [c702e30a43] check-in: [dfd3e79a8c] user: jan.nijtmans branch: trunk, size: 66113
22:32
Merge 8.6 file: [14d53fbe21] check-in: [30a29680e1] user: jan.nijtmans branch: core-8-branch, size: 65888
22:16
encoding-12.6 only works for "ucs2" for now. Don't use (deprecated) INLINE and CONST file: [80e1a29677] check-in: [aaeafee98b] user: jan.nijtmans branch: core-8-6-branch, size: 57965
16:04
Close utf-next-regressions file: [23810d5a63] check-in: [aea91e2e26] user: dgp branch: core-8-6-branch, size: 57972
14:37
merge 8.6 file: [d53a8ca4ee] check-in: [2e6bf5602d] user: dgp branch: dgp-alternative, size: 59653
2020-04-24
22:40
Quickfix to Tcl_NumUtfChars(). Barely used in Tcl core. Still needs a better look. Mark two new tests as knownBug. Needs a further look as well. file: [55a7e499c6] check-in: [1c7c61d459] user: jan.nijtmans branch: core-8-6-branch, size: 59653
21:54
WIP merging 8.6. file: [844569e750] check-in: [c18876475f] user: dgp branch: dgp-alternative, size: 59620
21:17
merge 8.6 file: [a0bf1d8545] check-in: [a68982ebc3] user: dgp branch: mistake, size: 59385
21:08
Merge 8.6; still have to sort out origins of failing tests.`

Botched this merge. Eliminated the things the branch was created to preserve in the first place. file: [40de756444] check-in: [137f33c330] user: dgp branch: mistake, size: 59156

21:01
Merge 8.5 (but this time correct) file: [4f0c01adfc] check-in: [38dbfc8a67] user: jan.nijtmans branch: core-8-6-branch, size: 59385
20:51
Merge 8.5. Failing tests need examination and adjustment. file: [f10ef7bc58] check-in: [12254b0693] user: dgp branch: core-8-6-branch, size: 59156
17:27
merge 8.5 file: [cce3cda4a4] check-in: [4285f56a71] user: dgp branch: dgp-27944a3661, size: 53299
17:18
Revert the parts of [76213b3f72] that converted callers of Tcl_UtfToUniChar into callers of Tcl_UtfNext. With this reversion, any future divergence between those two will not harm these callers. Retain the tests, and retain the new implementation of Tcl_UtfNext itself and its new macro form. file: [8d925de3c1] check-in: [870aba745a] user: dgp branch: core-8-5-branch, size: 52773
13:32
Merge 8.7 file: [3d047895f9] check-in: [33a6f75e9c] user: jan.nijtmans branch: trunk, size: 67794
13:31
Merge 8.6 file: [48cc25301d] check-in: [0b05e92ef6] user: jan.nijtmans branch: core-8-branch, size: 67576
13:15
Merge 8.5. Fix regression in Tcl_UtfComplete(), actually already present for longer time but masked by error in TclUtfNext() macro. Adapt expectations accordingly ("/xA0/xA0" should really have length 2 ....) file: [6e756078e9] check-in: [8cf6bd3ab7] user: jan.nijtmans branch: core-8-6-branch, size: 59620
10:49
Merge 8.7 file: [4ebccd3c84] check-in: [aa56ee88d9] user: jan.nijtmans branch: trunk, size: 67773
2020-04-23
22:10
Second attempt file: [3bb45d9524] check-in: [312fb13ea5] user: jan.nijtmans branch: utf-next-for-8.7, size: 67555
21:50
First attempt to merge Tcl_UtfNext()/Tcl_UtfPrev() improvements (check for invalid byte sequences) to 8.7 file: [05f74e48c9] check-in: [891db10cfb] user: jan.nijtmans branch: utf-next-for-8.7, size: 67717
20:22
Demonstrate that the failing tests on the 8.6 branch tip can equally well be solved by backing out the recent changes associated with [27944a3661]. file: [83043039a5] check-in: [2c7b1e958d] user: dgp branch: dgp-alternative, size: 59598
19:25
merge 8.5 file: [0b1e5cd94d] check-in: [92946e427a] user: dgp branch: dgp-27944a3661, size: 53214
19:07
Fix regression in Tcl_NumUtfChars, caused by this commit: [6596c4af31e29b5d]. Expectations of failing tests was adapted later, that's why this was missed. Lesson: Tcl_UtfNext() is _not_ just an optimized replacement for Tcl_UtfToUniChar(). Sorry, but this change it just to dangerous! Tcl_UniCharAtIndex() and Tcl_UtfAtIndex() most likely have the same regression when fead with invalid byte-sequences, therefore reverted those too.

HOLD ON! These regressions are equally the result of [5c322bbd51]. It takes both changes to cause the failing tests. We need to argue about which change was the wrong one. file: [6e58a843ab] check-in: [3fd0be11ba] user: jan.nijtmans branch: utf-next-regresions, size: 57833

19:04
Argument conditions for Invalid() call were not always satisfied. file: [7529c5d123] check-in: [0200ddd3d4] user: dgp branch: core-8-5-branch, size: 52688
18:22
Revised Tcl_UtfCharComplete() to be a proper safety filter for the revised needs of callers of Tcl_UtfNext(). file: [00df497402] check-in: [fecfc37392] user: dgp branch: dgp-27944a3661, size: 53186
14:51
Merge the two modes of Tcl_UtfNext into a single loop. file: [3a7cbf9366] check-in: [c6dba8e537] user: dgp branch: dgp-27944a3661, size: 52934
14:02
Revise the totalBytes array so that it stores the number of bytes in a valid byte sequence beginning with the byte value of the index. This is harmless to existing uses of the array. Trail bytes do not lead a valid byte sequence so their entries are changed to 0.

CORRECTION: That's not quite true. The change revises the results of tests utf-6.95 and utf-6.116 through changing results of Tcl_UtfCharComplete, but that's the point of this path. file: [2387b5d5bf] check-in: [4f1fbe0e8b] user: dgp branch: dgp-27944a3661, size: 52865

11:31
Fix [27944a3661]: Taming test utf-6.88 Fix [c11e0c5ce4]: Regression in Tcl_UtfCharComplete Fix [1b1f5f0b53]: Tcl_UtfNext incompatibility in Tcl 8.6.10 file: [9220af0032] check-in: [5c322bbd51] user: jan.nijtmans branch: core-8-6-branch, size: 59735
2020-04-22
22:13
(cherry-pick): Update documentation of Tcl_UtfPrev/Tcl_UtfNext back to how it was. Will be updated later, when implementation is ready and agreed upon. file: [2179ac8253] check-in: [dabf4dd28a] user: jan.nijtmans branch: core-8-6-branch, size: 60052
20:10
Adapt implementation and tests to vary behavior with TCL_UTF_MAX. file: [d32185c5a9] check-in: [de68f922b4] user: dgp branch: dgp-27944a3661, size: 52865
19:22
Place first-draft implementation of the proposed change to handling of trail bytes by Tcl_UtfNext on branch dgp-27944a3661 for examination. file: [a8ffe9df20] check-in: [853c8b207f] user: dgp branch: dgp-27944a3661, size: 52832
14:49
Merge 8.6 file: [1865dc3e3c] check-in: [43178b7bb4] user: jan.nijtmans branch: bug-c11e0c5ce4, size: 59746
08:10
Merge 8.7 file: [e41afeed1d] check-in: [e29ce0fdea] user: jan.nijtmans branch: trunk, size: 63697
07:59
Fix [27944a3661]: Taming test utf-6.88. Long-standing bug in Tcl_UtfNext(). Corner-case when the pointer doesn't advance to the start-byte of the next UTF-8 character. file: [6341278860] check-in: [aff418be86] user: jan.nijtmans branch: core-8-branch, size: 63472
07:41
Merge 8.5 file: [457a8304d3] check-in: [57148adcab] user: jan.nijtmans branch: core-8-6-branch, size: 61408
2020-04-21
10:24
Merge 8.7 file: [4f963f9768] check-in: [1d9d4a348c] user: jan.nijtmans branch: trunk, size: 63479
09:49
More test cleanup file: [519c9098ce] check-in: [926d62f948] user: jan.nijtmans branch: core-8-branch, size: 63254
07:26
Wrong indent in comment file: [7077e25dc4] check-in: [6a12cc0c59] user: jan.nijtmans branch: bug-27944a3661, size: 63474
07:18
Merge 8.7 file: [63d967a3f3] check-in: [8895c470f0] user: jan.nijtmans branch: bug-27944a3661, size: 63473
07:03
Add more test-cases for TCL_UTF_MAX>3 file: [21aa145451] check-in: [12d578a6ad] user: jan.nijtmans branch: bug-c11e0c5ce4, size: 59375
2020-04-20
22:00
Merge 8.7 file: [8563f71eb0] check-in: [e767cf6856] user: jan.nijtmans branch: trunk, size: 63477
21:54
Teach Tcl_UtfPrev() that 0xC1 is _always_ an invallid byte. Test-case utf-7.34. Make sure that Tcl_UtfCharComplete(src, TCL_UTF_MAX) always returns 1, for whatever bytes, since that's the maximum number of bytes Tcl_UtfToUniChar() can read in a single call. file: [94899e2693] check-in: [5f382f2056] user: jan.nijtmans branch: core-8-branch, size: 63252
21:07
Change a few variables from type "int" to "size_t". Always test TCL_UTF_MAX for <= 3 or > 3, because that's the only 2 flavours we really have. file: [53728707d7] check-in: [d515532940] user: jan.nijtmans branch: trunk, size: 63182
15:20
Proposed fix for [c11e0c5ce4]: Regression in Tcl_UtfCharComplete. file: [123d9496d9] check-in: [21362ee023] user: jan.nijtmans branch: bug-c11e0c5ce4, size: 59372
11:56
(cherry-pick): Proposed fix for [27944a3661]: Taming test utf-6.88. file: [8f5fe9844f] check-in: [c0d6e94d4e] user: jan.nijtmans branch: bug-c11e0c5ce4, size: 59895
05:17
Apply fix for [2738427] and adjust tests. file: [71c4cd01eb] check-in: [27cfa0775d] user: dgp branch: dgp-utf-explore, size: 52919
05:09
Apply fix for [493dccc2de] and adjust tests. file: [6454d41361] check-in: [7db642d35a] user: dgp branch: dgp-utf-explore, size: 52776
04:58
Apply fix for [5e6346a252] and adjust tests. file: [385e9d7fe5] check-in: [32bc8e9f0c] user: dgp branch: dgp-utf-explore, size: 51958
04:40
Apply first fix of [c61818e4c9] and adjust tests. file: [d71ee01f31] check-in: [4227206dc3] user: dgp branch: dgp-utf-explore, size: 48435
04:02
Apply the overlong rejection backport fix again, but this time don't attempt to include surrogate support. This was the first attempt on the 8.5 branch to leave behind the long outdated FSS-UTF approach that suggested support for 5- and 6-byte sequence support. Clearly there was no useful support for a TCL_UTF_MAX=4 build before this, and a TCL_UTF_MAX=6 build could have only been nonsense. There's no history worth reclaiming compatibility with in the custom builds. file: [68e09348df] check-in: [daafc7f0aa] user: dgp branch: dgp-utf-explore, size: 46751
2020-04-19
20:49
Merge 8.5, and add the fix for [27944a3661] here too. Getting closer to what test-results we expect.

Overlong sequences not handled yet, so that's where differences are expected file: [1baf563b6a] check-in: [7c70d7b16d] user: jan.nijtmans branch: core-8-5-orig, size: 51337

09:57
Proposed fix for [27944a3661]: Taming test utf-6.88. This fix is not optimized, it still uses TclUtfToUniChar() in its implementation. But optimizing work is on its way, hopefully, coming through 8.5 .. 8.6 .. and up. file: [918a834bf7] check-in: [74d3f929c8] user: jan.nijtmans branch: bug-27944a3661, size: 63296
2020-04-18
13:53
Missing Tcl_UniChar initializations allow 4-byte Tcl_UtfToUniChar to act on uninitialized mem and get into big trouble.

This bug fix is abandoned. A better solution was adopted instead. file: [c949351d34] check-in: [517b1c87da] user: dgp branch: bug-c574e50a3b, size: 49343

13:47
Fix [c574e50a3b30e76f]: CRASH: utf-2.[89] in 8.5 built with TCL_UTF_MAX=4 file: [2aff7aacbb] check-in: [dda19888bd] user: jan.nijtmans branch: bug-c574e50a3b, size: 52667
12:46
Update documentation of Tcl_UtfPrev/Tcl_UtfNext back to how it was. Will be updated later, when implementation is ready and agreed upon. file: [b85447eb6b] check-in: [05486a901f] user: jan.nijtmans branch: core-8-5-branch, size: 55396
2020-04-17
21:07
[493dccc2de] Revise sequence validity check to reject out of range decodes too. file: [0adcc0196e] check-in: [2282b5ecbf] user: dgp branch: core-8-5-branch, size: 56758
11:02
Fix implementation of Tcl_UtfAtIndex() for TCL_UTF_MAX=6 (There's no test-case for this in the core-8-6-branch, but there is in core-8-branch). file: [5e9dae4c8a] check-in: [a14c9e965e] user: jan.nijtmans branch: core-8-6-branch, size: 61037
10:32
More test-cases. Mark test-case utf-2.11 as "knownBug", doesn't give the right answer for any TCL_UTC_MAX value. TODO: To Be Fixed! (Don ????) Fix build/testcase for TCL_UTF_MAX=6 (testcase is OK, "string length" implementation was not!) file: [59de7f446e] check-in: [d9a7cd6292] user: jan.nijtmans branch: core-8-6-branch, size: 61016
09:17
Original implementation of Tcl 8.5 before the Tcl_UtfPrev/Tcl_UtfNext reform. But: Add the (harmless) TclUtfNext/TclUtfPrev macro's and add the 0xC1 byte as being invalid. Then ... add the testcases from core-8-5-branch (and a few more), with adapted expectations according to the original implementation. Let's see where we get. file: [25c60f32b5] check-in: [11eb38dbe2] user: jan.nijtmans branch: core-8-5-orig, size: 51201
05:49
Merging forward the Utf changes. Needs some repair yet. file: [ef790bca37] check-in: [36303870fc] user: dgp branch: dgp-fixme, size: 67944
05:25
merge 8.6

Abort this branch.

Rebasing a new proposed fix on top of recent reforms makes more sense now. file: [ff91171232] check-in: [73a39418d6] user: dgp branch: bug-31aa44375d, size: 61171

05:14
When supporting 4-byte sequences even with TCL_UTF_MAX = 3, need to paramterize a few things differently. (utf-4.11 failures). file: [0a6751f73d] check-in: [5b5ab20558] user: dgp branch: dgp-fixme, size: 60887
04:45
When supporting 4-byte sequences, make sure the Overlong test does too, and make sure the test results reflect it. file: [9218232ed2] check-in: [e537b566a1] user: dgp branch: dgp-fixme, size: 60981
04:08
Bring the single-byte marker for invalid lead byte \xC1 into the complete table file: [2b01547609] check-in: [8d170b2c64] user: dgp branch: core-8-6-branch, size: 60980
04:07
Merge 8.6 file: [c2a23eb5a7] check-in: [f579c619ea] user: dgp branch: bug-31aa44375d, size: 61171
03:54
Fix the bad tests utf-2.11 and utf-6.88 that expected the wrong results. Also reconcile the merge from 8.5 to the new decoupling of bytesequence counts from indexed code unit couints. Docs still need an update. file: [a744ed0e1f] check-in: [7f56d3c4b1] user: dgp branch: core-8-6-branch, size: 60980
2020-04-16
22:19
Fix build for TCL_UTF_MAX=4. Mark some failing tests with "knownBug". Those still need to be fixed! file: [c25a1a5819] check-in: [660902f443] user: jan.nijtmans branch: core-8-6-branch, size: 61248
22:06
Fix more test-cases for TCL_UTF_MAX=3 file: [7949e6dbc9] check-in: [46026a9db7] user: jan.nijtmans branch: core-8-6-branch, size: 61247
20:59
Adjust test results and implementation for Tcl 8.6 current support of 4-byte sequences in a TCL_UTF_MAX=3 build. file: [d778cb9e8c] check-in: [010ec5fc7c] user: dgp branch: core-8-6-branch, size: 61159
20:52
merge 8.6 file: [18b0ad5b14] check-in: [660bca000f] user: dgp branch: bug-31aa44375d, size: 61227
20:38
Merge 8.5. Failing tests for now. To be remedied shortly. file: [a76bc31afc] check-in: [6596c4af31] user: dgp branch: core-8-6-branch, size: 61139
20:28
merge litter file: [0d78cf74a8] check-in: [7e421dce7c] user: dgp branch: core-8-5-branch, size: 56387
20:02
merge 8.6 file: [6245021e7b] check-in: [ad63873abe] user: dgp branch: bug-31aa44375d, size: 55797
19:02
More detailed comments. file: [bd309aa713] check-in: [f6a8c5432f] user: dgp branch: dgp-utf-next, size: 56394
18:42
compiler warning file: [39c439d18f] check-in: [f4baf7e7af] user: dgp branch: dgp-utf-next, size: 55038
18:40
More tests and fix for overlong handling in revised Tcl_UtfNext. file: [9bef9c5891] check-in: [29041a6d75] user: dgp branch: dgp-utf-next, size: 55021
17:55
merge 8.5 file: [2b00aa795e] check-in: [aadfd3448c] user: dgp branch: dgp-utf-next, size: 54973
17:30
Convert Overlong() to use a lookup table. file: [2632bc61a4] check-in: [5550e9ac96] user: dgp branch: bug-5e6346a252, size: 54724
16:22
When we reject overlong sequences, \xC1 is as invalid a lead byte as \xFF. file: [cb7af25b02] check-in: [ce6e2901e2] user: dgp branch: bug-5e6346a252, size: 54597
15:48
merge 8.5 file: [61fdf895aa] check-in: [877190c2e0] user: dgp branch: dgp-utf-next, size: 51450
15:38
merge 8.5 file: [0283bd7e47] check-in: [b59c18d809] user: dgp branch: bug-5e6346a252, size: 54614
2020-04-15
22:39
Refactor the Overlong test into a utility routine. file: [ac914a5f27] check-in: [9cf8c75384] user: dgp branch: bug-5e6346a252, size: 54616
22:28
Use test existence to shorten comment. file: [0dcc7991ab] check-in: [32c044a1f8] user: dgp branch: bug-5e6346a252, size: 54087
21:48
Rework Tcl_UtfPrev so it properly handles overlong sequences. file: [465d171fd5] check-in: [bd5edfc3bc] user: dgp branch: bug-5e6346a252, size: 54308
20:03
Merge 8.7 file: [cbbad6543c] check-in: [94f0de486e] user: jan.nijtmans branch: trunk, size: 63182
19:54
Merge 8.6 file: [413cb0fda4] check-in: [9adf3a7df5] user: jan.nijtmans branch: core-8-branch, size: 63075
19:38
Merge 8.5 file: [10fc6f514a] check-in: [58b4d49d41] user: jan.nijtmans branch: core-8-6-branch, size: 55681
19:33
New test command "testutfnext", not used yet in actual test-cases. Being merged up to higher branches. (Thanks, Don!) file: [5e70ee5041] check-in: [4eb84fae1d] user: jan.nijtmans branch: core-8-5-branch, size: 51201
2020-04-14
21:32
typo file: [f5da4a5eb4] check-in: [3e47eb4000] user: dgp branch: dgp-utf-next, size: 51452
21:25
Fix the bad logic in Tcl_UtfNext(). file: [fbf644d6c2] check-in: [eb95fbb884] user: dgp branch: dgp-utf-next, size: 51453
21:16
Replace calls of TclUtfToUniChar() with TclUtfNext() when caller has no decoding need. Failing test string-22.14 indicates something is still not quite right. Now that Tcl_NumUtfChars() is not paying decoding prices, we let it spend to properly protect against overflow. [2738427] file: [289613fbcc] check-in: [d0cc6cdb7f] user: dgp branch: dgp-utf-next, size: 51277
20:03
The function of Tcl_UtfNext() is to advance a pointer. There's nothing inherent in that task that requires decoding of the characters, but the implementation does that. Let's try a simpler solution for callers that do not need the content decoded. file: [4f6160fc46] check-in: [9e87b14c18] user: dgp branch: dgp-utf-next, size: 51410
14:30
Merge 8.7 file: [9d84961fe0] check-in: [bd5cc21d39] user: jan.nijtmans branch: trunk, size: 63184
10:17
Merge 8.6 file: [68226143a0] check-in: [250fd8a281] user: jan.nijtmans branch: core-8-branch, size: 63077
09:19
Merge 8.6 file: [aecd727c2e] check-in: [76244d6280] user: jan.nijtmans branch: bug-31aa44375d, size: 55799
08:44
Fix unit-test, change expectations according to current 8.6 branch (not handling [1b1f5f0b53] yet, doing that in separate branch) file: [1770c1c337] check-in: [e64b5a4785] user: jan.nijtmans branch: bug-c61818e4c9, size: 55683
00:09
Create separate tables to serve Tcl_UtfPrev and Tcl_UtfComplete. file: [dec5b78a2e] check-in: [c47ec8e3a3] user: dgp branch: bug-c61818e4c9, size: 55751
2020-04-13
21:45
merge 8.6 file: [95b1991b1b] check-in: [9eccb6d91f] user: dgp branch: bug-31aa44375d, size: 55096
21:36
Merge 8.5. Failing tests highlight ticket [1b1f5f0b53]. file: [ddad3919a5] check-in: [50b6d532c4] user: dgp branch: core-8-6-branch, size: 55085
18:42
Make the comments describing Tcl_UtfPrev more complete and precise. file: [095e0f9527] check-in: [0db5bfbfeb] user: dgp branch: dgp-utf-prev-alt, size: 51203
2020-04-10
15:22
Merge 8.7 file: [196953aa7e] check-in: [894e2ce40e] user: jan.nijtmans branch: trunk, size: 60900
13:02
Merge 8.6 file: [c4d166b8d8] check-in: [cd1ce15d26] user: jan.nijtmans branch: core-8-branch, size: 60794
12:19
Since Tcl_UtfCharComplete() now guarantees that at least 3 more bytes are available for header bytes 0x80-0xBF, check those 3 bytes first in Tcl_UtfToUniChar() before doing other checks (that might point to uninitialized memory in non-confirming extensions) file: [5c4cc96207] check-in: [99ac71af15] user: jan.nijtmans branch: core-8-6-branch, size: 53400
2020-04-09
19:22
merge 8.5 file: [cc540eb927] check-in: [d3bf59e09a] user: dgp branch: dgp-utf-prev-alt, size: 49569
17:30
Guarantee TclNeedSpace and TclFindElement have common definition of whitespace by having both call the same routine. Create a macro form to contain performance costs and adapt callers. file: [321be2df7e] check-in: [da9a473a26] user: dgp branch: core-8-5-branch, size: 49519
2020-04-08
17:31
Apply better bug fix that does not create new bugs this time. file: [78ac68637f] check-in: [7ee618b9ca] user: dgp branch: dgp-utf-prev-alt, size: 49568
16:43
Restore the original Tcl_UtfPrev routine. Fails a different set of tests. Many fewer. file: [c14653a813] check-in: [6c34d40ef3] user: dgp branch: dgp-utf-prev-alt, size: 49572
15:37
introduces new utf-internal definition UTF_TO_UNI_MAX, as maximal value that Tcl_UtfToUniChar could return depending to UTF_TO_UNI_MAX and tcl version (here 3 or 4), restricting search cycle of Tcl_UtfPrev now. file: [63dd626b9c] check-in: [f10ba6b7ac] user: sebres branch: sebres-utf-prev-fix, size: 50307
13:26
partially revert [e8bfb4c2ba884b1b], a prevention added previously in string trim functions (compare char lengths) is not needed with new version of Tcl_UtfPrev (leave other optimizations as appropriate, may be important in some obscure case not covered by current or future implementation of Tcl_UtfPrev/Tcl_UtfToUniChar) file: [d84096c4c4] check-in: [57aa5dae72] user: sebres branch: sebres-utf-prev-fix, size: 50138
12:32
Rewrite of Tcl_UtfPrev that restores the correct behavior (no inconsistency between Tcl_UtfPrev/Tcl_UtfToUniChar anymore), Tcl_UtfPrev strictly follows all rules of Tcl_UtfToUniChar now, also fixing all test cases now. file: [91d7ab8dba] check-in: [cdffcbaec9] user: sebres branch: sebres-utf-prev-fix, size: 50144
2020-04-07
20:04
Rewrite of Tcl_UtfPrev that fixes most of the failing tests.

Remaining failures demonstrate that the interface of Tcl_UtfPrev cannot be satisfied robustly if we stick to the restriction of not reading (*src). file: [73e1a9bb8b] check-in: [ee25edfdb7] user: dgp branch: dgp-utf-prev-fix, size: 50382

2020-04-06
09:22
Make Tcl_UtfCharComplete() usable for both Tcl_UtfToUniChar() and Tcl_UtfToChar16(). Defect noticed by Don Porter. Thanks! Add test-cases, assuring correct handling of 4-byte UTF-8 sequences. Use "end-1", "end" and "end+1" in testcases related to Tcl_NumUtfChars(), that's more readable/maintainable than integers. file: [086b32b017] check-in: [20a619daf1] user: jan.nijtmans branch: core-8-branch, size: 60756
07:53
Revert commit [aed6634d2ccf2107], which backported part of TIP #389 (regarding internal handling of 4-byte UTF-8 sequences) to 8.6. file: [af4d711786] check-in: [48c83d55c8] user: jan.nijtmans branch: bug-31aa44375d, size: 53373
2020-04-05
20:36
Merge 8.6 file: [1953dc70c0] check-in: [b69fafa7f6] user: jan.nijtmans branch: core-8-branch, size: 60886
20:28
Partial fix for [31aa44375de2c87e]: Tcl_NumUtfChars regression in default 8.6 build. This commit brings Tcl_UtfCharComplete() into agreement with Tcl_UtfToUniChar(), whether it demands 1, 3 or 4 succeeding bytes. file: [ee669f4eb3] check-in: [166c0270e7] user: jan.nijtmans branch: core-8-6-branch, size: 53362
2020-04-03
14:31
merge 8.7 file: [4cb15b7aca] check-in: [cfeafcaff1] user: dgp branch: trunk, size: 60862
14:28
Fix broken build. file: [f7e34e9db0] check-in: [d76b30b67a] user: dgp branch: core-8-branch, size: 60638
09:14
Merge 8.7 file: [6d7160030f] check-in: [96a31010ce] user: jan.nijtmans branch: trunk, size: 60858
09:13
Simplify implementation of TclUtfToUCS4: The #undefined Tcl_UtfToUniChar() already does everything for use here (Unlike in Tcl 8.6, with has to live without TIP #542) file: [dfd49c138b] check-in: [4af836a404] user: jan.nijtmans branch: core-8-branch, size: 60634
2020-04-02
23:26
merge 8.7 file: [8d6c7ecaf2] check-in: [abac91f4a0] user: dgp branch: trunk, size: 61449
22:18
merge 8.6 file: [88d22c68cc] check-in: [f1db0ed67c] user: dgp branch: core-8-branch, size: 61225
22:18
Remove stray debug file: [252103a91a] check-in: [7fad0af377] user: dgp branch: core-8-6-branch, size: 53114
22:05
typo file: [9fcff62627] check-in: [e20f26ea10] user: dgp branch: core-8-6-branch, size: 53158
20:36
More callers. file: [ffc4318663] check-in: [b092a8788b] user: dgp branch: core-8-6-branch, size: 53156
20:18
New utility routine TclUtfToUCS4() to contain some complexity. Two callers adapted. file: [dd4033f180] check-in: [63e224f042] user: dgp branch: core-8-6-branch, size: 53667
2020-03-18
14:06
Merge 8.7 file: [611c4a948f] check-in: [1d0787097e] user: jan.nijtmans branch: trunk, size: 60180
13:54
Merge 8.6 file: [fe4cb5b540] check-in: [2b7c852b44] user: jan.nijtmans branch: core-8-branch, size: 59934
12:51
More uppercase HEX representations in source-code. file: [25e980fe33] check-in: [7d760e228e] user: jan.nijtmans branch: core-8-6-branch, size: 52183
2020-01-20
14:19
Merge 8.7 file: [feaee9d05d] check-in: [3d7b2dce60] user: jan.nijtmans branch: trunk, size: 60188
2019-12-03
16:13
Merge 8.7 file: [e638928518] check-in: [4105725743] user: jan.nijtmans branch: utf-max, size: 59942
2019-12-02
20:48
Merge 8.6 file: [fb044f8c43] check-in: [468df9021d] user: jan.nijtmans branch: core-8-branch, size: 59942
20:26
If TCL_UTF_MAX>=4, make Tcl_ParseBackslash combine two surrogates so they appear as one 4-byte UTF-8 byte sequence from the start. Add test-case for this. file: [dcaec0d214] check-in: [43032d7ba3] user: jan.nijtmans branch: core-8-6-branch, size: 52191
2019-09-25
13:42
Merge 8.7 file: [16bdd9b5d7] check-in: [be233f3e67] user: jan.nijtmans branch: utf-max, size: 59947
2019-09-16
21:41
Merge 8.7 file: [7ce80f4d77] check-in: [343db6648f] user: jan.nijtmans branch: trunk, size: 60198
21:35
Merge 8.6 file: [077dcce582] check-in: [5057c37bdf] user: jan.nijtmans branch: core-8-branch, size: 59947
21:18
Bugfix in Tcl_UtfPrev/Tcl_UtfNext: When handling 4-byte UTF-8 byte sequences, those should be able to move back/forward 4 bytes if TCL_UTF_MAX <= 4. Update comment accordingly. Bugfix in Tcl_UtfFindFirst/Tcl_UtfFindLast: Those functions should be able to find both the high surrogate (if asked for) as also the full character (combination of both surrogates) file: [83a9dd9fa9] check-in: [aed6634d2c] user: jan.nijtmans branch: core-8-6-branch, size: 52196
2019-09-14
13:11
Two paces where TCL_AUTO_LENGTH should be used file: [461130e810] check-in: [94f65c8701] user: jan.nijtmans branch: trunk, size: 60147
13:07
Merge 8.7 file: [4bf0afccb9] check-in: [9ea2eea22d] user: jan.nijtmans branch: trunk, size: 60117
2019-08-15
08:59
Merge 8.7 file: [f956e30ce4] check-in: [42a10393d8] user: jan.nijtmans branch: tip-548, size: 59896
08:10
Merge 8.7 file: [e68323bfe6] check-in: [f33e2933b5] user: jan.nijtmans branch: trunk, size: 61243
2019-08-14
07:24
Merge 8.7 file: [bfb184343a] check-in: [c367ba59b1] user: jan.nijtmans branch: no-register, size: 61014
2019-08-12
20:38
Merge branch tip-548. No longer define addtional stub-entries for functions that will be removed (because of deprecation) anyway file: [66aeb9ae4d] check-in: [50d822dbab] user: jan.nijtmans branch: utf-max, size: 59986
2019-08-11
21:17
Fix handling of length (size_t)-1 in tclMain.c. This should fix handling of command-line arguments with TCL_UTF_MAX=6, necessary to make tclsh run at all ... file: [2a1621d381] check-in: [33b7c8c229] user: jan.nijtmans branch: trunk, size: 61333
2019-08-05
20:15
Fix signature of TclWCharToUtfDString for TCL_UTF_MAX=6, and handling of length -1 file: [0b2a3ddc95] check-in: [1b13f758a5] user: jan.nijtmans branch: trunk, size: 61325
2019-08-02
15:03
Merge tip-548 file: [afbfcc9cdd] check-in: [d3a7842460] user: jan.nijtmans branch: utf-max, size: 60351
14:26
Merge 8.7 file: [116ccc2362] check-in: [b0fbdeb265] user: jan.nijtmans branch: trunk, size: 61304
13:35
Document that the *Backslash parsing functions output maximum 4 bytes, irrespectable of the TCL_UTF_MAX setting: It could be 4 for the "\Uxxxxxx" construct, but never more. Move <stddef.h> and <locale.h> to tclInt.h, so the can be removed from various other places. file: [50255cf53c] check-in: [1f393d7d01] user: jan.nijtmans branch: core-8-branch, size: 61104
09:00
Merge 8.7. Some formatting. file: [436c799582] check-in: [6305175e0c] user: jan.nijtmans branch: tip-548, size: 59986
2019-08-01
21:55
Protect Tcl_AToB() functions against NULL input file: [9f86858a88] check-in: [6e1922b861] user: jan.nijtmans branch: utf-max, size: 60358
16:03
Merge 8.7. Documentation improvements and code cleanup. Approaching finish. file: [bff0d2cb29] check-in: [57546481c1] user: jan.nijtmans branch: tip-548, size: 59996
11:55
Merge tip-548 file: [d2578eb0fc] check-in: [8819e7a6a3] user: jan.nijtmans branch: utf-max, size: 60180
08:02
Attempt to fix a179564826: Tk 8.6: prevent issues when encountering non-BMP Unicode characters file: [f55d7511cf] check-in: [f6eb4196ee] user: jan.nijtmans branch: bug-a179564826, size: 52162
2019-07-31
19:47
Merge 8.5 file: [9742291082] check-in: [98aa5b2f17] user: jan.nijtmans branch: core-8-6-branch, size: 52131
19:39
(cherry-pick from core-8-branch): Replace memcpy() calls with memmove() to avoid undefined behavior when source and destination overlap file: [c7c2794917] check-in: [382e19dea0] user: jan.nijtmans branch: core-8-5-branch, size: 49518
2019-07-17
15:38
Eliminate "register" keyword _everywhere_ in Tcl. This keyword is deprecated in C++ (removed in C++17, even), and essentially does nothing with most modern compilers. file: [5845f19aa9] check-in: [f074bda87c] user: jan.nijtmans branch: no-register, size: 61024
2019-07-11
07:18
Rename UTF-related functions to "WChar" and "Char16" variants, more intuitive because they represent wchar_t and char16_t (since C++11) types in modern compilers. file: [2893b6395e] check-in: [070bfd62cb] user: jan.nijtmans branch: tip-548, size: 60110
2019-07-07
20:38
Merge 8.7, and a few tweaks: Only provide Tcl_WinUtfToTChar on Tcl 8.x, not on 9.0 any more file: [30ac31a78d] check-in: [49e4bfa90e] user: jan.nijtmans branch: tip-548, size: 60070
2019-07-06
23:09
Fix UNIX/Mac build file: [b69a96ffb1] check-in: [dde79eb6a8] user: jan.nijtmans branch: tip-548, size: 60012
2019-07-05
09:03
Improvement: always export both 16-bit and 32-bit UTF function file: [94fa21f436] check-in: [27f2c4cf5e] user: jan.nijtmans branch: tip-548, size: 60003
2019-06-03
19:48
TIP #548: Deprecate Tcl_WinUtfToTChar() and Tcl_WinTCharToUtf() and provide more flexible replacement functions file: [e71bceebf1] check-in: [49785ba3b0] user: jan.nijtmans branch: tip-548, size: 60925
2019-05-22
21:50
More simplifications, taking deprecations into account file: [580294f8cd] check-in: [b95de9a625] user: jan.nijtmans branch: utf-max, size: 60174
07:32
More WIP: eliminate all usage of (platform-specific) Tcl_WinTCharToUtf()/Tcl_WinUtfToTChar() to its proposed portable replacements: Tcl_Utf16ToUtfDString()/Tcl_UtfToUtf16DString() This allows for Tcl_WinTCharToUtf()/Tcl_WinUtfToTChar() to be declared deprecated. file: [0cc57f9f6f] check-in: [a33e22b6ba] user: jan.nijtmans branch: utf-max, size: 64817
2019-05-10
07:50
Merge trunk file: [babc23af02] check-in: [f0009090a2] user: jan.nijtmans branch: regexp-api-64bit, size: 61314
07:46
merge 8.7 file: [1d09eea1b6] check-in: [f3302db091] user: jan.nijtmans branch: utf-max, size: 64767
2019-04-19
00:38
merge 8.7 file: [5965de89d4] check-in: [ddaa30125b] user: dkf branch: trunk, size: 61315
2019-04-17
19:23
Replace memcpy() calls with memmove() to avoid undefined behavior when source and destination overlap. file: [840b851393] check-in: [0b45548847] user: dgp branch: core-8-branch, size: 61114
2019-03-31
22:03
Enhance documentations. Move TCL_INDEX_NONE from tclInt.h to tcl.h, since it's too useful. file: [297ca17adf] check-in: [b1a506218e] user: jan.nijtmans branch: regexp-api-64bit, size: 61310
2019-03-28
22:49
Merge 8.7 file: [8450344d93] check-in: [eebb1e7ee1] user: jan.nijtmans branch: utf-max, size: 64763
2019-03-24
16:52
Merge 8.7 file: [e8c33d8e06] check-in: [c06a872943] user: jan.nijtmans branch: trunk, size: 61311
16:50
Since only bytes 0xF0 - 0xF4 can be the first byte of a valid 4-byte UTF-8 byte sequence, account for that in Tcl_UtfCharComplete(). file: [d96017f3d2] check-in: [9c09af3627] user: jan.nijtmans branch: core-8-branch, size: 61110
16:46
Since only bytes 0xF0 - 0xF4 can be the first byte of a valid 4-byte UTF-8 byte sequence, account for that in Tcl_UtfCharComplete(). Only effective when TCL_UTF_MAX>3 file: [3a932cdb85] check-in: [8e7ac039ab] user: jan.nijtmans branch: core-8-6-branch, size: 52115
16:43
Since only bytes 0xF0 - 0xF4 can be the first byte of a valid 4-byte UTF-8 byte sequence, account for that in Tcl_UtfCharComplete(). Only effective when TCL_UTF_MAX>3 file: [1afff3cd02] check-in: [12e58fc74d] user: jan.nijtmans branch: core-8-5-branch, size: 49514
13:05
Merge 8.7 file: [1a683a59b0] check-in: [15927b5ba6] user: jan.nijtmans branch: utf-max, size: 64773
2019-03-21
20:28
Merge 8.7. Also fix invalid reference to TclUtfToWChar, causing build failure file: [12b4c2e88b] check-in: [7c63883789] user: jan.nijtmans branch: utf-max, size: 64783
20:11
Merge 8.7 file: [64a2c08a05] check-in: [871076a655] user: jan.nijtmans branch: trunk, size: 61321
19:56
Remove incorrect comment. Simplify handling of last bytes in Tcl_UniCharToUtfDString(), since TclUtfToUniChar() already turns out to handle cp1252 fall-back correctly. file: [872868a1fa] check-in: [33251a211f] user: jan.nijtmans branch: core-8-branch, size: 61120
2019-03-20
22:54
Merge 8.7 file: [b9c85dc62a] check-in: [3ea5d3e8a3] user: jan.nijtmans branch: utf-max, size: 64957
22:51
Merge 8.7 file: [29dc2e6ddb] check-in: [bb9b52ab82] user: jan.nijtmans branch: trunk, size: 61496
22:45
Fix Tcl_UtfToUniCharDString() function, handling invalid byte at the end of the string: Not quite correct for bytes between 0x80-0x9F, according to TIP file: [e3072ae62e] check-in: [3e8ada19f5] user: jan.nijtmans branch: core-8-branch, size: 61295
2019-03-18
22:34
Merge 8.7 file: [45d678a779] check-in: [48c676b649] user: jan.nijtmans branch: trunk, size: 61473
22:32
Comment Comment Tcl_UniCharToUtf() better, what happens handling surrogates. Add type cast in tclUtf.c, making actual check clearer file: [8c6f2eb584] check-in: [b02df08680] user: jan.nijtmans branch: core-8-branch, size: 61272
20:07
Add 4 new encodings, and add documentation. file: [da54a774af] check-in: [0ac59eb0c6] user: jan.nijtmans branch: utf-max, size: 64937
2019-03-17
22:01
More WIP. Seems to be *almost* working. file: [7e148c56a6] check-in: [ab13cbd74c] user: jan.nijtmans branch: utf-max, size: 64779
2019-03-16
21:10
Merge 8.7 Move up some stub entries related to Tcl_UniChar Use TCL_UTF_MAX=4 for full Unicode in stead of TCL_UTF_MAX=6 (TCL_UTF_MAX: 3 is default) file: [819d977c5c] check-in: [81502a66ed] user: jan.nijtmans branch: utf-max, size: 61340
2019-03-14
20:59
Merge 8.7. Fix 2 test-cases which were failing for TCL_UTF_MAX=6 file: [da5dbc33a1] check-in: [4032e7fe99] user: jan.nijtmans branch: utf-max, size: 61340
2019-03-12
20:39
Even better support for -DTCL_UTF_MAX=6. Ongoing improvements (TIP being planned) file: [9fd3389618] check-in: [fdcb2a7323] user: jan.nijtmans branch: utf-max, size: 61291
2019-03-10
21:04
Merge 8.7 file: [9ec20eb789] check-in: [316ceb7616] user: jan.nijtmans branch: trunk, size: 61463
20:18
re-implemente changes in win/tclWinFile.c (handling -DTCL_UTF_MAX=6) using 3 new utility functions. This allows to re-use code in more places: cleaner implementation more future-proof. file: [b626580e33] check-in: [9eb437a15d] user: jan.nijtmans branch: core-8-branch, size: 61262
2019-03-08
23:54
Merge trunk. Further WIP for TIP #497, far from finished .... file: [dbe3f15e63] check-in: [9fcbdde251] user: jan.nijtmans branch: tip-497, size: 53015
22:51
Merge 8.7 file: [5f3cb4c241] check-in: [fe02c8898e] user: jan.nijtmans branch: trunk, size: 56119
2019-03-07
22:02
Fixes for TCL_UTF_MAX=6, (gcc compiler warnings). Also make everything work on win32/win64. Patch adapted from Androwish (thanks, Werner!) file: [3bc47b8988] check-in: [650574e0fb] user: jan.nijtmans branch: utf-max-6, size: 55918
2019-03-05
18:23
merge 8.7 (TIP#527, New measurement facilities in TCL: New command timerate, performance test suite) file: [39861b85b0] check-in: [e41cbd042a] user: sebres branch: trunk, size: 56041
2019-03-02
17:21
Add build with -DTCL_UTF_MAX=6 to travis CI. Also fix 2 gcc compiler-warnings occurring with -DTCL_UTF_MAX=6 file: [d2bbeadb16] check-in: [9b2a385a0f] user: jan.nijtmans branch: core-8-branch, size: 55840
16:53
Merge 8.7 file: [94d7d839ce] check-in: [e766d23655] user: jan.nijtmans branch: trunk, size: 56015
16:52
Minor optimization in UTF-8 handling, and add some comments describing how Tcl_UniCharToUtf() handles surrogates. file: [a53e54f604] check-in: [6e3632ede5] user: jan.nijtmans branch: core-8-branch, size: 55814
16:04
Backport [bd94500678e837d7] from 8.7, preventing endless loops in UTF-8 conversions when handling surrogates. Only effective when compiling with -DTCL_UTF_MAX=4|6 (default: 3). Meant for benefit of Androwish. file: [3926a96e0c] check-in: [9e1984c250] user: jan.nijtmans branch: core-8-6-branch, size: 52121
2019-03-01
20:25
merge 8.7 file: [859e807c72] check-in: [bc57eb7213] user: dgp branch: trunk, size: 55336
20:24
A confusion about signed vs unsigned comparision caused Tcl_UtfToUniChar() to return the wrong answer (contents of random memory) for each single byte UTF-8 in the input. This commit fixes that bug. More commentary on https://core.tcl-lang.org/tcl/tktview/bd94500678 file: [3b7ff4047c] check-in: [81046b694f] user: dgp branch: core-8-branch, size: 55135
2019-02-27
21:58
Merge 8.7 file: [b6c50a53f1] check-in: [727e74f081] user: jan.nijtmans branch: trunk, size: 55314
2019-02-26
19:37
More use of (efficient) TclHasIntRep() macro. Also eliminate many (size_t) and (unsigned) type-casts, which don't make sense any more. file: [8b057c57be] check-in: [2c7db3fa01] user: jan.nijtmans branch: mistake, size: 54843
2019-02-25
21:10
Finish complete fix, all corner-cases correct now. Also spurious UTF-8 testcase failure (as seen on travis) fixed now. file: [5a69d8b0b8] check-in: [b3d886c84f] user: jan.nijtmans branch: bug-bd94500678, size: 55113
2019-02-19
20:17
Merge 8.7 file: [ad73841f17] check-in: [2b82daafb8] user: jan.nijtmans branch: trunk, size: 55045
20:16
Merge 8.6 file: [fc24fefce3] check-in: [1b17625b60] user: jan.nijtmans branch: core-8-branch, size: 54879
19:38
Minor optimizations file: [ada6318df4] check-in: [0e2621fc4b] user: jan.nijtmans branch: bug-bd94500678, size: 55015
2019-02-18
20:48
Proposed fix for [bd94500678]: SEGFAULT by conversion of unicode (out of BMP) to byte-array. file: [d4d424d4fc] check-in: [9f67c17d01] user: jan.nijtmans branch: bug-bd94500678, size: 55303
2019-02-04
22:45
Merge trunk file: [b86e29514a] check-in: [a6db8815ce] user: jan.nijtmans branch: tip-502-for-9, size: 55051
2018-10-27
19:33
Backport various minor issues from 8.6: - gcc compiler warning in tclDate.c - protect Tcl_UtfToUniCharDString() from ever reading more than "length" bytes from its input, not even in the case of invalid UTF-8. - update to latest tzdata - fix 2 failing test-cases on MacOSX file: [6468647908] check-in: [f339295ed5] user: jan.nijtmans branch: core-8-5-branch, size: 49520
2018-10-09
18:52
Merge trunk file: [83b59c8873] check-in: [2ec0fe1752] user: jan.nijtmans branch: tip-497, size: 58210
2018-10-08
19:00
TIP #494 implementation: More use of size_t in Tcl 9 file: [0c6eb3b4a5] check-in: [f3d49044c4] user: jan.nijtmans branch: trunk, size: 55044
18:34
Merge 8.6 Also fix startup problems on win32, when the encoding path contains invalid UTF-8 (reported by François Vogel) Various other code cleanup, e.g. remove empty.zip file, as this didn't work quite as expected. file: [f38fd5f3de] check-in: [59a19cdee3] user: jan.nijtmans branch: core-8-branch, size: 54885
2018-10-06
19:20
Use more TCL_AUTO_LENGTH, when appropriate file: [26a99cdd2c] check-in: [c643e8fe38] user: jan.nijtmans branch: memory-API, size: 55050
2018-10-04
21:07
merge trunk file: [9ca6287650] check-in: [3ca1d5cace] user: jan.nijtmans branch: memory-API, size: 55035
2018-10-03
20:02
Merge 8.6 file: [8ff5e7dab8] check-in: [252cdd0247] user: jan.nijtmans branch: core-8-branch, size: 54891
19:24
Tcl_UniCharToUtfDString: Don't allocate too much memory for this function. Tcl_UtfToUniCharDString: Don't allocate too much memory for this function. And make sure that we never access more than 'length' bytes from the string, not even when encountering invalid UTF-8. file: [5c56cf0280] check-in: [8ba821d1fd] user: jan.nijtmans branch: core-8-6-branch, size: 51782
2018-07-05
21:25
Merge trunk Handle TclCopyAndCollapse, *Trace* et al file: [5b1fa6fc07] check-in: [399b8e7649] user: jan.nijtmans branch: memory-API, size: 54784
2018-06-28
21:42
More API changes using size_t. Internal changes not complete yet (WIP) file: [e1a768ce0f] check-in: [1bfecd9172] user: jan.nijtmans branch: memory-API, size: 54788
2018-06-27
19:09
merge trunk file: [5d60dd06e5] check-in: [2cc2d71f0a] user: jan.nijtmans branch: memory-API, size: 54782
2018-06-24
20:26
Fix "string tolower" and friends for handling unpaired surrogates correctly. Also add test-cases for those situations. Various typo's in comments. file: [2f8a5b250e] check-in: [1cdc9199e9] user: jan.nijtmans branch: core-8-branch, size: 54640
2018-06-18
15:59
Merge 8.6. And add more documentation and test-cases regarding the behavior of Tcl_UniCharToUtf() file: [3e4218a8ae] check-in: [3cb0cedeb6] user: jan.nijtmans branch: core-8-branch, size: 54235
15:54
Fix [53cad613d8]: TIP 389 implementation makes Tk tests font-4.12 and font-4.15 fail. One more situation in which high surrogate causes problem file: [8a6cd287b8] check-in: [c45de6cdb4] user: jan.nijtmans branch: core-8-6-branch, size: 51617
2018-05-25
06:55
merge trunk file: [4d8ecc6ca8] check-in: [e9340634d6] user: jan.nijtmans branch: memory-API, size: 54132
2018-05-23
20:09
merge trunk file: [e99fd221b0] check-in: [1a6fcb9bdc] user: jan.nijtmans branch: memory-API, size: 54023
2018-05-11
09:20
Merge 8.5. This adds Emoji 11.0 support, when Tcl is compiled with TCL_UTF_MAX>3. Useful for Androwish, for example. file: [18caccb00b] check-in: [708287d936] user: jan.nijtmans branch: core-8-6-branch, size: 51380
09:14
Add emoji 11.0 to the set. Only active when compiled with TCL_UTF_MAX>3. Also prepare tooling for Unicode 11.0 (while being on it) file: [deb90da498] check-in: [ee9f293421] user: jan.nijtmans branch: core-8-5-branch, size: 49339
2018-05-07
07:40
Remove some tip389 restrictions in test-cases, which are no longer necessary. Eliminate gcc compiler warnings when compiling with -DTCL_UTF_MAX=6 Other code clean-up and comment improvements. No change in functionality. file: [ca1f89b62b] check-in: [385fda311b] user: jan.nijtmans branch: core-8-branch, size: 53998
2018-05-01
19:02
Start implementing TIP #497. regexp's now are >BMP-aware. WIP file: [150a0d9005] check-in: [47ace058d4] user: jan.nijtmans branch: tip-497, size: 57214
18:41
Implement special "string totitle" for Extended Georgian characters (new behavior in Unicode 11) file: [caf8cb360a] check-in: [827b7761e6] user: jan.nijtmans branch: core-8-branch, size: 54010
2018-04-23
23:32
Merge 8.6 (bug-fix and test-case for Tcl_UtfAtIndex with TCL_UTF_MAX=4) file: [5f0b67c7df] check-in: [567e61b329] user: jan.nijtmans branch: mistake, size: 53810
23:23
Bug-fix in Tcl_UtfAtIndex (for TCL_UTF_MAX=4 only). With test-case (in "string totitle") demonstrating the bug. file: [2e32d28e48] check-in: [3d8301e3c6] user: jan.nijtmans branch: core-8-6-branch, size: 51170
2018-04-20
10:16
TIP #389 implementation. file: [b4aa23654b] check-in: [e109760b1c] user: jan.nijtmans branch: core-8-branch, size: 53402
2018-04-19
22:29
Slightly improved (more fail-safe) surrogate handling for TCL_UTF_MAX>3. Backported from latest TIP 389 implementation. (to be used for androwish) file: [e8fa0b3a54] check-in: [686259e650] user: jan.nijtmans branch: core-8-6-branch, size: 50997
2018-04-17
21:49
Slightly better unmatched-surrogates handling. Unmatched High surrogates will still be silently removed, but Unmatched Low surrogates will pass through as-is now. Inspired by Kevin Kenny's remarks. Thanks! file: [1b3222a95c] check-in: [1997a15ffd] user: jan.nijtmans branch: tip-389, size: 54226
2018-01-31
12:18
Change Tcl_Token definition (int -> size_t). Many related code-changes. file: [2fdbede656] check-in: [e3a724b790] user: jan.nijtmans branch: memory-API, size: 51867
2018-01-29
11:36
merge trunk file: [81d5d091e6] check-in: [3faa71ab4f] user: jan.nijtmans branch: memory-API, size: 51864
2018-01-10
08:27
merge core-8-branch file: [feb7253c86] check-in: [b3fc2fbe3d] user: jan.nijtmans branch: tip-389, size: 53860
08:25
Fix 00a27923ee: (Tcl part, remaining is in Tk) text/entry dysfunctional when pasting an emoji on MacOSX. This changes the handling of incoming valid 4-byte UTF-8 sequences: Those are no longer split in 4 separate characters (as was done for invalid byte sequences) but replaced by a single ' replacement character' . file: [ae6a7420d8] check-in: [8481a52495] user: jan.nijtmans branch: core-8-branch, size: 51831
2018-01-09
11:15
(partial) fix for 00a27923ee: text/entry dysfunctional when pasting an emoji on MacOSX. Don't handle incoming valid 4-byte UTF-8 characters as invalid byte sequences (since they aren't), but as being the Unicode replacement character. file: [2e1cc8edda] check-in: [f0adfe7dac] user: jan.nijtmans branch: bug-00a27923ee, size: 50872
2017-12-28
18:49
Fix handling of surrogates (when TCL_UTF_MAX > 3) in Tcl_UtfNcmp()/Tcl_UtfNcasecmp()/TclUtfCasecmp(). Backported from core-8-branch, where this was fixed already. file: [06ae0e4cdb] check-in: [1ebc1bcaa5] user: jan.nijtmans branch: core-8-6-branch, size: 50632
2017-12-01
15:03
merge trunk file: [104e15d362] check-in: [e008d0adce] user: jan.nijtmans branch: memory-API, size: 51624
11:33
merge core-8-branch file: [335ddcbc61] check-in: [dabd924a87] user: jan.nijtmans branch: tip-389, size: 53620
2017-11-30
13:14
Fix [8e1e31eac0]: lsort treats NUL chars strangely. Also fix various initializations, which only make a difference when TCL_UTF_MAX == 4. Add new test-cases which demonstrate the fix. For TCL_UTF_MAX == 4, surrogates will now be handled as expected as well when sorting. file: [8c0c76d552] check-in: [b6438b69ad] user: jan.nijtmans branch: core-8-branch, size: 51591
2017-11-29
12:28
merge core-8-6-branch file: [143baa2d32] check-in: [e45dcdac38] user: jan.nijtmans branch: core-8-branch, size: 50421
12:27
Fix Tcl_UtfFindFirst()/Tcl_UtfFindLast(), which were broken by [83c0c569d6]. Not detected, because those functions aren't used anywhere in Tcl. So, added new test-cases, makeing sure this doesn't happen again. file: [047d99f7af] check-in: [d906b55e4b] user: jan.nijtmans branch: core-8-6-branch, size: 50495
11:49
Merge core-8-branch. Also, use a different value for TCL_STUB_MAGIC when TCL_UTF_MAX>4. file: [8697ae0536] check-in: [1916b6a72e] user: jan.nijtmans branch: tip-389, size: 53414
11:05
merge core-8-6-branch file: [32d38162fe] check-in: [8976a447aa] user: jan.nijtmans branch: core-8-branch, size: 50415
11:04
Update some functions in tclUtf.c to handle surrogate pairs when TCL_UTF_MAX == 4. Also update documentation to distinguish better between "Tcl_UniChar" and "Unicode character": Those are not necessary the same when TCL_UTF_MAX == 4. No change when TCL_UTF_MAX == 4 or TCL_UTF_MAX == 6. file: [a8a9b0c072] check-in: [83c0c569d6] user: jan.nijtmans branch: core-8-6-branch, size: 50489
09:49
Fix [8e1e31eac0]: lsort treats NUL chars strangely file: [3634cc925c] check-in: [e2a6110884] user: jan.nijtmans branch: tip-389, size: 53032
08:59
Treat invalid UTF-8 characters in the range 0x80-0x9F as cp1252: See https://en.wikipedia.org/wiki/UTF-8. To be added to TIP #389 file: [e2b60d7d03] check-in: [b2521a4844] user: jan.nijtmans branch: tip-389, size: 51439
2017-11-20
12:58
merge core-8-branch file: [4746e08149] check-in: [1a3cef7d5c] user: jan.nijtmans branch: tip-389, size: 50879
2017-11-17
16:08
merge core-8-branch. Fix some Tcl_UniChar initialization, in case TCL_UTF_MAX == 4 file: [e1dd3d468b] check-in: [37b4aab687] user: jan.nijtmans branch: tip-389, size: 50849
2017-11-16
13:50
Handle Tcl_UtfAtIndex/Tcl_UniCharAtIndex for extended index range. More field fixes. file: [f289098ae1] check-in: [78e8c0c5f6] user: jan.nijtmans branch: memory-API, size: 49361
2017-11-07
12:15
Somewhat simplified implementation of TIP #389, in which the "string length" if characters > U+FFFF is considered to be 2, not 1. file: [dbf9778521] check-in: [d224d38a6d] user: jan.nijtmans branch: tip-389, size: 50453
2017-09-10
13:35
merge novem file: [e3b0ac4884] check-in: [4fa9cc2903] user: jan.nijtmans branch: novem-more-memory-API, size: 49345
2017-09-01
08:50
merge trunk file: [812d0a432b] check-in: [3f9db43f3e] user: jan.nijtmans branch: tip-389-impl, size: 51044
2017-08-29
09:19
Merge trunk file: [a37dead009] check-in: [f2f6504adb] user: jan.nijtmans branch: novem, size: 49347
2017-08-25
13:46
Merge trunk file: [0b140acd38] check-in: [c841475a89] user: jan.nijtmans branch: tip-389-impl, size: 51586
2017-08-18
22:06
merge core-8-6-branch file: [719d2c9f48] check-in: [75da8b29f8] user: jan.nijtmans branch: trunk, size: 49382
2017-07-03
12:32
merge core-8-6-branch file: [acc04c27ac] check-in: [4467b7768e] user: jan.nijtmans branch: rfe-6c0d7aec67, size: 49456
08:27
'inline static' -> 'static inline' and 'INLINE' -> 'inline', for consistancy. file: [e3eb4abccf] check-in: [5b95f585fa] user: jan.nijtmans branch: core-8-6-branch, size: 48209
2017-06-13
12:18
merge core-8-6-branch file: [bd5a76e562] check-in: [8eec477ed9] user: jan.nijtmans branch: rfe-6c0d7aec67, size: 49449
2017-06-09
12:20
merge trunk file: [51c99fdd3f] check-in: [ee2c5cf945] user: jan.nijtmans branch: tip-389-impl, size: 50991
11:48
merge novem file: [ca0264252a] check-in: [f687ce6a36] user: jan.nijtmans branch: novem-more-memory-API, size: 48062
2017-06-08
12:59
merge trunk file: [a2c1bb7c45] check-in: [7b30d63181] user: jan.nijtmans branch: novem, size: 48100
12:38
merge core-8-6-branch file: [fbcc907f13] check-in: [9053c4a54f] user: jan.nijtmans branch: trunk, size: 48135
12:37
Fix [2738427]: Tcl_NumUtfChars(...) no overflow check. file: [fc2a8e3d34] check-in: [6b5843fde9] user: jan.nijtmans branch: core-8-6-branch, size: 48216
12:34
Fix [2738427]: Tcl_NumUtfChars(...) no overflow check. file: [1f59446a7f] check-in: [e376f4b734] user: jan.nijtmans branch: core-8-5-branch, size: 46889
08:26
Better UTF-8 surrogate handling, only functional when TCL_UTF_MAX>3 file: [ab16f203f3] check-in: [5ae46a0093] user: jan.nijtmans branch: rfe-6c0d7aec67, size: 49268
2017-06-06
14:46
Merge trunk file: [4ac6e3f55e] check-in: [b03462a529] user: jan.nijtmans branch: tip-389-impl, size: 50671
09:25
merge trunk file: [5c1140810c] check-in: [6c9ae4580b] user: jan.nijtmans branch: novem, size: 47957
09:24
merge core-8-6-branch file: [5eabb6b0c9] check-in: [28a2ae3999] user: jan.nijtmans branch: trunk, size: 47992
09:23
Follow-up to [67aa9a2070]: Use uppercase consistantly, slight optimization in character tests, comment fixes. No change in functionality. file: [0164f065d3] check-in: [5b20178e5a] user: jan.nijtmans branch: core-8-6-branch, size: 48073
09:02
[67aa9a2070] Tcl_UtfToUniChar returns single byte for invalid UTF-8 input as documented. file: [3a98891648] check-in: [3998688718] user: jan.nijtmans branch: core-8-5-branch, size: 46746
2017-05-31
12:51
merge trunk file: [69395d50cc] check-in: [f07a85fee0] user: jan.nijtmans branch: tip-389-impl, size: 50309
12:06
merge core-8-6-branch file: [10fb9f5ff7] check-in: [c30eb96495] user: jan.nijtmans branch: trunk, size: 47983
12:05
Fix [67aa9a2070]: Security: Invalid UTF-8 can inject unexpected characters file: [f1b6c68212] check-in: [f1b9559259] user: jan.nijtmans branch: sebres-8-6-clock-speedup-cr1, size: 48064
2017-05-29
15:38
merge trunk file: [e7e769c463] check-in: [6d946ddf77] user: jan.nijtmans branch: tip-389-impl, size: 50354
2016-08-30
13:57
merge trunk file: [12de15edf6] check-in: [2a530e021b] user: jan.nijtmans branch: tip-389-impl, size: 49592
13:12
merge trunk file: [1e04544386] check-in: [11d4243035] user: jan.nijtmans branch: novem, size: 47810
13:07
Don't ever allow UTF-8 sequences of more than 4 characters to be generated or parsed, even when TCL_UTF_MAX>4: According to current Unicode standard, a byte string of >4 characters can never form a single UTF-8 character. And a few minor micro-optimizations related to UTF-8 handling. file: [3ed37d3c92] check-in: [2de1551609] user: jan.nijtmans branch: trunk, size: 47845
13:00
Don't ever allow UTF-8 sequences of more than 4 characters to be generated or parsed, even when TCL_UTF_MAX>4: According to current Unicode standard, a byte string of >4 characters can never form a single UTF-8 character. And a few minor micro-optimizations related to UTF-8 handling. file: [f38d3847d1] check-in: [c0a65532a7] user: jan.nijtmans branch: core-8-6-branch, size: 47926
2016-04-08
14:25
Merge trunk. Add new bitflags to tclStringRep.h (not used yet) file: [84f00ec8a7] check-in: [2d87e13575] user: jan.nijtmans branch: tip-389-impl, size: 50615
2016-04-05
12:07
merge trunk file: [24271b6f0f] check-in: [9edc83a71a] user: dgp branch: novem, size: 47851
09:32
Rename UtfCount() to TclUtfCount() and use it in more places. Suggested by pspjuth here: [e99a79a32650e7e5] file: [a677bf3b17] check-in: [9c4e4beddb] user: jan.nijtmans branch: trunk, size: 47886
2016-03-22
16:09
Update all Unicode tables to version 9.0 beta file: [63578ee9bf] check-in: [203b8dcd6c] user: jan.nijtmans branch: tip-389-impl, size: 50696
2015-10-24
02:37
merge changes from pspjuth that optimize conversion from unichar to utf and add optimized versions for reading a word from byte codes. file: [74420169a8] check-in: [e99a79a326] user: kbk branch: drh-micro-optimization, size: 47955
2015-09-22
08:20
merge trunk file: [157972ae30] check-in: [eb6c2fe41b] user: jan.nijtmans branch: novem, size: 47932
2015-09-02
08:44
merge trunk file: [35f36782ec] check-in: [50dc66790e] user: jan.nijtmans branch: tip-389-impl, size: 50698
2015-09-01
18:54
Various Unicode handling enhancements, when building with TCL_UTF_MAX > 3, inspired by androwish. No effect if TCL_UTF_MAX=3 (which is the default) file: [09f91effbd] check-in: [e2278643dc] user: jan.nijtmans branch: trunk, size: 47967
2014-05-01
09:40
Merge trunk. Update Unicode tables to Unicode 7.0 beta. file: [e3d12611ee] check-in: [13a1d81916] user: jan.nijtmans branch: tip-389-impl, size: 51750
2013-09-26
13:13
merge novem

WARNING: No checks of build-ability done yet. file: [ece5863523] check-in: [2688d65077] user: dkf branch: novem-64bit-sizes, size: 47059

2013-08-02
10:33
merge trunk file: [8df87bb7d1] check-in: [396ccb299c] user: jan.nijtmans branch: novem, size: 47028
2013-07-29
10:12
Make sure that "string is space \u202f" will continue to return "1", even if in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: http://www.unicode.org/review/pri249/. Don't hardcode "tclWinError.o" for Cygwin file: [33f59c69c1] check-in: [a72287aa7d] user: jan.nijtmans branch: trunk, size: 47063
09:29
Make sure that "string is space \u202f" will continue to return "1", even if in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: http://www.unicode.org/review/pri249/ file: [c78a847600] check-in: [334ab96e5e] user: jan.nijtmans branch: core-8-5-branch, size: 46902
2013-07-08
18:56
Unbreak MSVC6 debug build (thanks Andreas Kupries!) file: [7f126daa4a] check-in: [b259e31b93] user: jan.nijtmans branch: novem, size: 46998
18:56
Unbreak MSVC6 debug build (thanks Andreas Kupries!) file: [6df9363dfd] check-in: [d369017148] user: jan.nijtmans branch: trunk, size: 47033
18:55
Unbreak MSVC6 debug build (thanks Andreas Kupries!) file: [cdb7d01000] check-in: [728fb2f25b] user: jan.nijtmans branch: core-8-5-branch, size: 46872
2013-06-18
13:46
merge trunk file: [2c2fb3838a] check-in: [e6a93ffe18] user: jan.nijtmans branch: tip-389-impl, size: 51727
11:50
merge trunk file: [9bb2674df5] check-in: [bc4d6bb1d4] user: jan.nijtmans branch: novem, size: 46991
2013-06-17
13:29
Use more portable TclIsSpaceProc() in stead of isspace(). file: [d724f20e08] check-in: [4bfe3111b1] user: jan.nijtmans branch: trunk, size: 47026
13:23
Use more portable TclIsSpaceProc() in stead of isspace(). Make sure that "string is space \u180e" continues to return 1 for whatever unicode version. file: [8bab297239] check-in: [5712054958] user: jan.nijtmans branch: core-8-5-branch, size: 46865
2013-06-10
07:36
Update Unicode tables to Unicode 6.3 beta. Merge trunk file: [928ed7508d] check-in: [9e9232f06e] user: jan.nijtmans branch: tip-389-impl, size: 51749
2013-05-28
07:25
merge trunk file: [fda007828f] check-in: [03d94f0a34] user: jan.nijtmans branch: novem, size: 47013
2013-05-22
13:07
[3613609]: Replace strcasecmp() with UTF-8-aware version. file: [4f663c3e2e] check-in: [89f027f118] user: dkf branch: trunk, size: 47048
12:55
Fixed the weird edge case. file: [27e7fb5619] check-in: [93dd8bb33b] user: dkf branch: bug-3613609, size: 46831
2013-05-21
09:38
Slight improvement: if cs = "\xC0\x80" and ct = "\x00", loop would continue after NUL-byte, this should not happen. file: [a8a8d1c5ce] check-in: [a765f37f78] user: jan.nijtmans branch: bug-3613609, size: 46535
09:27
Proposed solution for 3613609: lsort -nocase does not sort non-ASCII correctly file: [a1c47ea4d8] check-in: [66c30c4369] user: jan.nijtmans branch: bug-3613609, size: 46506
2013-05-06
09:08
Change Tcl_UtfNcmp and friend's signature to use size_t in stead of unsigned long. This is potentially binary incompatible on win64, but not on any other platform. It eliminates the need for special stub-wrappers on Cygwin64 for those functions. "novem" doesn't promise binary compatibility anyway. file: [a57017e4db] check-in: [9bb59c6083] user: jan.nijtmans branch: novem, size: 46130
2013-02-25
16:03
Merge trunk. Unicode 6.3 does not have that many spaces..... file: [af50b567c7] check-in: [0e5e775003] user: jan.nijtmans branch: tip-389-impl, size: 50866
13:52
merge trunk. Update all unicode tables to current state of Unicode 6.3 (not released yet) file: [7dacd44f65] check-in: [2fffdb3621] user: jan.nijtmans branch: novem, size: 46165
13:38
Merge trunk. Upgrade all tables to Unicode 6.3 (not released yet) file: [9f57826203] check-in: [adf88dd045] user: jan.nijtmans branch: tip-389-impl, size: 50907
13:16
For Unicode 6.3, mongolian vowel separator (U+180e) is nominated to change character class from Space to Control character. Make sure that "string is space" will continue to return 1 for this character. See TIP #413. file: [5ce9c3041c] check-in: [b553432c31] user: jan.nijtmans branch: trunk, size: 46165
2012-11-29
21:25
Purge remnants of support for compilers ignorant of C keyword 'inline'. file: [27ec49e846] check-in: [3b2b41c66a] user: dgp branch: novem, size: 46130
2012-11-18
17:21
... and even more file: [a79e50140e] check-in: [7a9b06162b] user: dkf branch: novem-64bit-sizes, size: 46126
2012-11-16
14:15
More work done. Still I am finding places where int should be size_t and reverse. file: [95e82b324a] check-in: [d348e679f7] user: jan.nijtmans branch: novem-64bit-sizes, size: 46165
12:38
merge "novem"

Everything compiles now, but it doesn't run yet.

file: [c5f05186c5] check-in: [005a09e2be] user: jan.nijtmans branch: novem-64bit-sizes, size: 46172
2012-10-09
01:48
merge trunk

Dont include U+0082 and U+0083 in the Tcl space set

file: [4734351403] check-in: [227a4f0b70] user: jan.nijtmans branch: tip-318-update, size: 46130
2012-09-23
16:48
tip 318 update file: [0020a81e19] check-in: [f09c1bc377] user: jan.nijtmans branch: tip-318-update, size: 46195
2012-09-21
09:27
new Tcl_UniCharIsWhitespace function file: [bc71fd9e19] check-in: [7a11f2ec28] user: jan.nijtmans branch: tip-318-update, size: 46696
2012-07-03
19:02
Feature branch to explore making use of the Hoehrmann UTF-8 decoder. http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ file: [22aafac1f8] check-in: [8e353afbe6] user: dgp branch: dgp-hoehrmann-decoder, size: 46264
2012-02-03
21:25
merge trunk file: [8252ed9abd] check-in: [ffb46b40ba] user: jan.nijtmans branch: tip-389-impl, size: 50721
2012-01-26
21:58
merge trunk file: [b92935e908] check-in: [702fdde8e2] user: jan.nijtmans branch: tip-389-impl, size: 50711
2012-01-23
22:19
merge trunk file: [8b3efc8560] check-in: [9eab8924a0] user: jan.nijtmans branch: tip-389-impl, size: 50555
09:38
merge trunk file: [d3c3ef5c3a] check-in: [41d9eb81af] user: dkf branch: dkf-utf16-branch, size: 45980
2012-01-22
21:50
[Frq 3473670]: Various Unicode-related file: [b14d5dc7be] check-in: [d772d08f8a] user: jan.nijtmans branch: trunk, size: 45979
21:49
[Frq 3473670]: Various Unicode-related speedups/robustness file: [821f56bcd2] check-in: [2ccfd0f771] user: jan.nijtmans branch: core-8-5-branch, size: 45948
2012-01-16
13:35
merge trunk file: [6b20262117] check-in: [0d8d161fe8] user: dkf branch: dkf-utf16-branch, size: 46056
2012-01-14
09:30
rfe-3473670: Various Unicode-related speedups/robustness file: [b972e779b5] check-in: [92168a99c1] user: jan.nijtmans branch: rfe-3473670, size: 46004
2012-01-09
20:34
[Bug 3464428] string is graph \u0120 is wrong file: [f103b9e575] check-in: [e9a619e9dc] user: jan.nijtmans branch: trunk, size: 46055
20:31
[Bug 3464428] string is graph \u0120 is wrong file: [89865539a7] check-in: [14fc5c19b7] user: jan.nijtmans branch: core-8-5-branch, size: 46024
19:59
[Bug 3464428] string is graph \u0120 is wrong file: [8c9e89c5d6] check-in: [a0c0feafe9] user: jan.nijtmans branch: core-8-4-branch, size: 46080
2011-12-31
15:12
merge trunk file: [d7f87d218d] check-in: [e0cf8ae638] user: dkf branch: dkf-utf16-branch, size: 46758
2011-12-24
00:30
[Bug 3464428] string is graph \u0120 is wrong file: [0877da7330] check-in: [0c1ac83954] user: jan.nijtmans branch: trunk, size: 46757
00:15
[Bug 3464428] string is graph \u0120 is wrong file: [ff4f29c06b] check-in: [005fc77cde] user: jan.nijtmans branch: core-8-5-branch, size: 46726
2011-12-23
23:31
[Bug 3464428] string is graph \u0120 is wrong file: [e5c5f74bb0] check-in: [13071df962] user: jan.nijtmans branch: core-8-4-branch, size: 46880
2011-08-24
08:32
fix tests utf-2.8 and utf-2.9 file: [3520ee4534] check-in: [39ae4108bf] user: jan.nijtmans branch: tip-389-impl, size: 50669
07:50
Upcoming TIP implementation: Full support for Unicode 6.0 file: [e61b09f86a] check-in: [5721cf9ae6] user: jan.nijtmans branch: tip-389-impl, size: 50601
2011-07-27
10:40
Start work towards being able to work with utf8 fully and utf16 and other things outside the BMP. file: [2a2f356f38] check-in: [f9f8c8425c] user: dkf branch: dkf-utf16-branch, size: 46774
2011-04-28
16:02
More isspace() callers. file: [4469252ff0] check-in: [41acfe91ea] user: dgp branch: trunk, size: 46773
16:00
More isspace() callers. file: [11a1eed14a] check-in: [88095bbde0] user: dgp branch: core-8-5-branch, size: 46742
2011-03-02
20:27
Now that we're no longer using SCM based on RCS, the RCS Keyword lines cause more harm than good. Purged them (except in zlib files). file: [b95ca708fe] check-in: [c64f310d38] user: dgp branch: trunk, size: 46789
16:06
Now that we're no longer using SCM based on RCS, the RCS Keyword lines cause more harm than good. Purged them. file: [f3b9f73e39] check-in: [79367df0f0] user: dgp branch: core-8-5-branch, size: 46758
2011-03-01
15:38
Now that we're no longer using SCM based on RCS, the RCS Keyword lines cause more harm than good. Purged them. file: [515557ce50] check-in: [90b4acd7bd] user: dgp branch: core-8-4-branch, size: 46896
2010-11-17
16:32
Next slice, big chunk of the generic IO layer plus a few miscellanea. file: [daefb09659] check-in: [40a5fd2632] user: andreask branch: activestate-nre-excised-variant-1-roll-forward, size: 46856
2009-09-07
16:39
merge updates from HEAD file: [2922d7afd9] check-in: [8c0a6a5799] user: dgp branch: dgp-refactor, size: 46860
07:28
* generic/tclExecute.c: fix potential uninitialized variable use and * generic/tclFCmd.c: null dereference flagged by clang static * generic/tclProc.c: analyzer. * generic/tclTimer.c: * generic/tclUtf.c:
* generic/tclExecute.c: silence false positives from clang static * generic/tclIO.c: analyzer about potential null dereference. * generic/tclScan.c: * generic/tclCompExpr.c:
file: [69316ce61d] check-in: [e93f957325] user: das branch: trunk, size: 46856
2009-02-11
17:27
merge updates from HEAD file: [41d4f6bc48] check-in: [f07460d448] user: dgp branch: dgp-refactor, size: 46856
15:28
* generic/tclStringObj.c: Changed type of the 'allocated' field of the String struct from size_t to int since only int values are ever stored in it.
file: [c4aba6267e] check-in: [93efedde3f] user: dgp branch: trunk, size: 46852
2008-05-11
04:22
merge updates from HEAD file: [899279da02] check-in: [b084fd8e3a] user: dgp branch: dgp-refactor, size: 46829
2008-04-27
22:21
Get rid of pre-C89-isms (esp. CONST vs const). file: [d58ad8109d] check-in: [2d205c22fb] user: dkf branch: trunk, size: 46825
2005-11-03
17:52
merge updates from HEAD file: [7ee2da14dc] check-in: [d827b9cf1e] user: dgp branch: dgp-refactor, size: 46829
2005-10-31
15:59
Convert to using ANSI decls/definitions and using the (ANSI) assumption that NULL can be cast to any pointer type transparently. file: [231e655e04] check-in: [1e0170d2bf] user: dkf branch: trunk, size: 46825
2005-09-12
15:40
merge updates from HEAD file: [c7c9e665ca] check-in: [156f19bcaf] user: dgp branch: dgp-refactor, size: 47247
2005-09-09
18:48
[kennykb-numerics-branch] Merge updates from HEAD.
file: [b74363491e] check-in: [343239eeff] user: dgp branch: kennykb-numerics-branch, size: 47247
2005-09-07
15:31
* generic/tclUtf.c (Tcl_UniCharToUtf): Corrected handling of negative * tests/utf.test (utf-1.5): Tcl_UniChar input value. Incorrect handling was producing byte sequences outside of Tcl's legal internal encoding. [Bug 1283976].
file: [2830ca0d80] check-in: [c76f2a1966] user: dgp branch: trunk, size: 47243
14:35
* generic/tclUtf.c (Tcl_UniCharToUtf): Corrected handling of negative * tests/utf.test (utf-1.5): Tcl_UniChar input value. Incorrect handling was producing byte sequences outside of Tcl's legal internal encoding. [Bug 1283976].
file: [4d21dfc317] check-in: [8d8a47a587] user: dgp branch: core-8-4-branch, size: 46967
2005-08-02
18:14
merge updates from HEAD file: [642f661d22] check-in: [10feab7c07] user: dgp branch: kennykb-numerics-branch, size: 47152
2005-07-26
04:11
Merge updates from HEAD file: [6b3a173352] check-in: [8351a734a6] user: dgp branch: dgp-refactor, size: 47152
2005-07-21
14:38
Systematizing the formatting file: [459bdd81b3] check-in: [ac613e6b94] user: dkf branch: trunk, size: 47148
2005-06-13
01:45
*** MERGE WITH HEAD *** (tag msofer-wcodes-20050611)
file: [abb0e59dd0] check-in: [d666b09ed5] user: msofer branch: msofer-wcodes-branch, size: 47149
2005-05-10
18:33
Merged kennykb-numerics-branch back to the head; TIPs 132 and 232 file: [05ebb1f08c] check-in: [1cc2336920] user: kennykb branch: trunk, size: 47146
2005-05-05
17:55
Merged with HEAD file: [f8567f610d] check-in: [b77c9a87c6] user: kennykb branch: kennykb-numerics-branch, size: 47150
2005-05-04
17:34
merge updates from HEAD file: [4702b80d6d] check-in: [edf99c3880] user: dgp branch: dgp-refactor, size: 47146
2005-05-03
18:07
* doc/DString.3: Eliminated use of identifier "string" in Tcl's * doc/Environment.3: public C API to avoid conflict/confusion with * doc/Eval.3: the std::string of C++. * doc/ExprLong.3, doc/ExprLongObj.3, doc/GetInt.3, doc/GetOpnFl.3: * doc/ParseCmd.3, doc/RegExp.3, doc/SetResult.3, doc/StrMatch.3: * doc/Utf.3, generic/tcl.decls, generic/tclBasic.c, generic/tclEnv.c: * generic/tclGet.c, generic/tclParse.c, generic/tclParseExpr.c: * generic/tclRegexp.c, generic/tclResult.c, generic/tclUtf.c: * generic/tclUtil.c, unix/tclUnixChan.c:
* generic/tclDecls.h: `make genstubs`
file: [bfdb1027d4] check-in: [83aa957ebe] user: dgp branch: trunk, size: 47142
2003-10-16
02:28
Merged updates from HEAD file: [0a87a53c0a] check-in: [44102608b1] user: dgp branch: dgp-refactor, size: 46872
2003-10-08
14:24
Made Tcl_NumUtfChars do the right thing with \u0000 when guessing the length because of a negative 'length' parameter. [Bug 769812] file: [4bafe028d7] check-in: [6b243da1f0] user: dkf branch: trunk, size: 46868
14:21
Made Tcl_NumUtfChars do the right thing with \u0000 when guessing the length because of a negative 'length' parameter. [Bug 769812] file: [d09ebf9cdd] check-in: [257a93c349] user: dkf branch: core-8-4-branch, size: 46872
2003-03-06
23:27
* generic/TclUtf.c (Tcl_UniCharNcasecmp): Corrected failure to * tests/utf.test (utf-25.*): properly compare Unicode strings of different case in a case insensitive manner. [Bug 699042]
file: [386fe22320] check-in: [a7fde7d55a] user: dgp branch: trunk, size: 46896
23:24
* generic/TclUtf.c (Tcl_UniCharNcasecmp): Corrected failure to * tests/utf.test (utf-25.*): properly compare Unicode strings of different case in a case insensitive manner. [Bug 699042]
file: [bb62f8f756] check-in: [8003bbacd1] user: dgp branch: core-8-4-branch, size: 46900
2003-02-18
02:25
* generic/tclExecute.c (TclExecuteByteCode INST_STR_MATCH): * generic/tclCmdMZ.c (Tcl_StringObjCmd STR_MATCH): * generic/tclUtf.c (TclUniCharMatch): * generic/tclInt.decls: add private TclUniCharMatch function that * generic/tclIntDecls.h: does string match on counted unicode * generic/tclStubInit.c: strings. Tcl_UniCharCaseMatch has the * tests/string.test: failing that it can't handle strings or * tests/stringComp.test: patterns with embedded NULLs. Added tests that actually try strings/pats with NULLs. TclUniCharMatch should be TIPed and made public in the next minor version rev.
file: [cac938b6c2] check-in: [28dcdcf39e] user: hobbs branch: trunk, size: 46835
2002-11-12
02:26
* generic/tclUtf.c: make use of TclUtfToUniChar macro throughout the functions, and add extra optimization to Tcl_NumUtfChars for one-byte/char case.
file: [78e68b4785] check-in: [af7f25d96a] user: hobbs branch: trunk, size: 42114
2002-08-20
20:25
merged with trunk at tag macosx-8-4-merge-2002-08-20-trunk file: [7443fa11a6] check-in: [354986d9c3] user: das branch: macosx-8-4-branch, size: 41944
2002-08-05
03:24
* doc/CmdCmplt.3: Applied Patch 585105 to fully CONST-ify * doc/Concat.3: all remaining public interfaces of Tcl. * doc/CrtCommand.3: Notably, the parser no longer writes on * doc/CrtSlave.3: the string it is parsing, so it is no * doc/CrtTrace.3: longer necessary for Tcl_Eval() to be * doc/Eval.3: given a writable string. Also, the * doc/ExprLong.3: refactoring of the Tcl_*Var* routines * doc/LinkVar.3: by Miguel Sofer is included, so that the * doc/ParseCmd.3: "part1" argument for them no longer needs * doc/SetVar.3: to be writable either. * doc/TraceVar.3: * doc/UpVar.3: Compatibility support has been enhanced so * generic/tcl.decls that a #define of USE_NON_CONST will remove * generic/tcl.h all possible source incompatibilities with * generic/tclBasic.c the 8.3 version of the header file(s). * generic/tclCmdMZ.c The new #define of USE_COMPAT_CONST now does * generic/tclCompCmds.c what USE_NON_CONST used to do -- disable * generic/tclCompExpr.c only those new CONST's that introduce * generic/tclCompile.c irreconcilable incompatibilities. * generic/tclCompile.h * generic/tclDecls.h Several bugs are also fixed by this patch. * generic/tclEnv.c [Bugs 584051,580433] [Patches 585105,582429] * generic/tclEvent.c * generic/tclInt.decls * generic/tclInt.h * generic/tclIntDecls.h * generic/tclInterp.c * generic/tclLink.c * generic/tclObj.c * generic/tclParse.c * generic/tclParseExpr.c * generic/tclProc.c * generic/tclTest.c * generic/tclUtf.c * generic/tclUtil.c * generic/tclVar.c * mac/tclMacTest.c * tests/expr-old.test * tests/parseExpr.test * unix/tclUnixTest.c * unix/tclXtTest.c * win/tclWinTest.c
file: [b18e42ccdc] check-in: [e476c22fec] user: dgp branch: trunk, size: 41940
2002-07-19
12:31
Global symbols are now all either prefixed with 'tcl' (or 'Tcl' or ...) or have file-scope. file: [4126660f15] check-in: [86e27ff753] user: dkf branch: trunk, size: 44537
2002-06-10
05:33
Merging with TOT as of 06/09/2002. file: [ed60c139a6] check-in: [73b68fb238] user: wolfsuit branch: macosx-8-4-branch, size: 44540
2002-05-30
03:27
* unix/configure: regen'ed * unix/configure.in: replaced bigendian check with autoconf standard AC_C_BIG_ENDIAN, which defined WORDS_BIGENDIAN on bigendian systems. * generic/tclUtf.c (Tcl_UniCharNcmp): * generic/tclInt.h (TclUniCharNcmp): use WORDS_BIGENDIAN instead of TCL_OPTIMIZE_UNICODE_COMPARE to enable memcmp alternative.
file: [308dcb7710] check-in: [5a5c16e5a7] user: hobbs branch: trunk, size: 44533
2002-05-29
10:35
Made Tcl_UniCharNcmp faster on big-endian machines; the system memcmp()is probably optimized far in excess of anything we could do! Little-endian just use the old code... file: [384a26efe9] check-in: [b3535ea391] user: dkf branch: trunk, size: 44570
09:09
* generic/tclInt.decls: * generic/tclIntDecls.h: * generic/tclStubInit.c: * generic/tclUtf.c: added TclpUtfNcmp2 private command that mirrors Tcl_UtfNcmp, but takes n in bytes, not utf-8 chars. This provides a faster alternative for comparing utf strings internally. (Tcl_UniCharNcmp, Tcl_UniCharNcasecmp): removed the explicit end of string check as it wasn't correct for the function (by doc and logic).
* generic/tclCmdMZ.c (Tcl_StringObjCmd): reworked the string equal comparison code to use TclpUtfNcmp2 as well as short-circuit for equal objects or unequal length strings in the equal case. Removed the use of goto and streamlined the other parts.
* generic/tclExecute.c (TclExecuteByteCode): added check for object equality in the comparison instructions. Added short-circuit for != length strings in INST_EQ, INST_NEQ and INST_STR_CMP. Reworked INST_STR_CMP to use TclpUtfNcmp2 where appropriate, and only use Tcl_UniCharNcmp when at least one of the objects is a Unicode obj with no utf bytes.
file: [03dbd6b1d6] check-in: [c78da914be] user: hobbs branch: trunk, size: 44362
2002-02-08
02:52
* Partial TIP 27 rollback. Following routines restored to return (char *): Tcl_DStringAppend, Tcl_DStringAppendElement, Tcl_JoinPath, Tcl_TranslateFileName, Tcl_ExternalToUtfDString, Tcl_UtfToExternalDString, Tcl_UniCharToUtfDString, Tcl_GetCwd, Tcl_WinTCharToUtf. Also restored Tcl_WinUtfToTChar to return (TCHAR *) and Tcl_UtfToUniCharDString to return (Tcl_UniChar *). Modified some callers. This change recognizes that Tcl_DStrings are de-facto white-box objects.
* generic/tclCmdMZ.c: corrected use of C++-style comment.
file: [8496aab82e] check-in: [bb1a244cde] user: dgp branch: trunk, size: 43084
2002-02-05
02:21
Merging with the current TOT. Very few conflicts, mostly in the generated files. file: [2bd859f5d7] check-in: [f469a31a06] user: wolfsuit branch: macosx-8-4-branch, size: 43105
2002-01-26
01:10
* Sought out and eliminated instances of CONST-casting that are no longer needed after the TIP 27 effort.
file: [e7d378c3b1] check-in: [4bca1d26db] user: dgp branch: trunk, size: 43096
2002-01-17
03:03
* Updated APIs in generic/tclUtf.c and generic/tclRegexp.c according to the guidelines of TIP 27. Updated callers.
file: [73d8335019] check-in: [17ade15700] user: dgp branch: trunk, size: 43105
2002-01-02
13:52
Fixed fault with case-insensitive string matching (Bug#233257) and rewrote some tests to test what they claimed to be testing. file: [608df3bfe8] check-in: [99e550c5be] user: dkf branch: trunk, size: 43103
2001-10-16
05:31
Undo of mistaken commit. Sorry! file: [c911d35dc6] check-in: [092e282e8d] user: dgp branch: trunk, size: 43048
05:10
  • Added test to demonstrate memory corruption problems. [Bug 219393].
file: [203c69f186] check-in: [6134ad6471] user: dgp branch: trunk, size: 43038
2001-09-27
13:49
Backing out unwise changes file: [35b0daf5fe] check-in: [4c0c25f627] user: dkf branch: dkf-64bit-support-branch, size: 43052
2001-09-26
14:23
Now builds on Solaris8/SPARC with both SunPro CC *and* GCC. file: [af6d32ff69] check-in: [4850711173] user: dkf branch: dkf-64bit-support-branch, size: 43066
2001-09-13
19:33
* generic/tclUtf.c (Tcl_UtfPrev): corrected to return the proper location when the middle of a UTF-8 byte was passed in. [Bug #450504]
file: [08c7f3b96f] check-in: [5b19d3b799] user: hobbs branch: core-8-3-1-branch, size: 38202
19:31
* generic/tclUtf.c (Tcl_UtfPrev): corrected to return the proper location when the middle of a UTF-8 byte was passed in. [Bug #450504]
file: [0411922794] check-in: [5f02e9a154] user: hobbs branch: trunk, size: 43050
2001-07-16
23:14
2001-07-02 Jeff Hobbs <[email protected]>
* tests/util.test: added util-4.6 * generic/tclUtil.c (Tcl_ConcatObj): Corrected walking backwards over utf-8 chars. [Bug #227512]

2001-06-27 Jeff Hobbs <[email protected]>

* generic/tclUtf.c (Tcl_UtfBackslash): Corrected backslash handling of multibyte utf-8 chars. [Bug #217987]
* generic/tclCmdIL.c (InfoProcsCmd): fixed potential mem leak in info procs that created objects without using them.
* generic/tclCompCmds.c (TclCompileStringCmd): fixed mem leak when string command failed to parse the subcommand.

2001-05-22 Jeff Hobbs <[email protected]>

* generic/tclObj.c (TclAllocateFreeObjects): simplified objSizePlusPadding to use sizeof(Tcl_Obj) (max)
file: [b1e3fb83a7] check-in: [bef6467977] user: hobbs branch: core-8-3-1-branch, size: 38256
2001-06-28
01:10
* tests/subst.test: * generic/tclUtf.c (Tcl_UtfBackslash): Corrected backslash handling of multibyte utf-8 chars. [Bug #217987]
file: [32045ec19e] check-in: [7eba85bf31] user: hobbs branch: trunk, size: 43023
2001-04-06
10:50
Fixed problem with [string compare \x00 \x01] and hopefully sped the command up in a few cases too (notably byte arrays and UNICODE objects.) [Bug #219201] file: [ca14319b76] check-in: [6677432d73] user: dkf branch: trunk, size: 42633
2000-06-05
23:36
Comment typo correction. file: [7d4c2af295] check-in: [68d103692c] user: ericm branch: trunk, size: 42731
2000-05-08
22:06
removed unreferenced var file: [60f04b3f80] check-in: [5482dfcb56] user: hobbs branch: trunk, size: 42731
21:59
* doc/Utf.3: * generic/tclStubInit.c: * generic/tcl.decls: * generic/tclDecls.h: * generic/tclUtf.c: Added new functions Tcl_UniCharNcasecmp and Tcl_UniCharCaseMatch (unicode parallel to Tcl_StringCaseMatch) * generic/tclUtil.c: rewrote Tcl_StringCaseMatch algorithm for optimization and made Tcl_StringMatch just call Tcl_StringCaseMatch
file: [7a3ece8e99] check-in: [52c8e2d16d] user: hobbs branch: trunk, size: 42750
2000-01-11
22:08
* generic/tclUtf.c: changed Tcl_UtfBackslash to not allow non-octal digits (8,9) in \ooo substs. [Bug: 3975]
* generic/tcl.h: noted need to change win/tcl.m4 and tools/tclSplash.bmp for minor version changes
file: [072d41fd6b] check-in: [f664655d33] user: hobbs branch: trunk, size: 37858
1999-07-22
01:08
* doc/Utf.3: * generic/tcl.decls: * generic/tclInt.decls: * generic/tclDecls.h: * generic/tclIntDecls.h: * generic/tclUtf.c: * compat/strftime.c: * unix/tclUnixTime.c: Changed function declarations in non-platform-specific APIs to use "unsigned long" instead of "size_t", which may not be defined on certain compilers (rather than include sys/types.h, which may not exist). file: [1063baf081] check-in: [22b143003b] user: redman branch: trunk, size: 37736
1999-06-24
03:27
* unix/Makefile.in: Changed install-doc to install-man.

* tools/uniParse.tcl: * tools/uniClass.tcl: * tools/README: * tests/string.test: * generic/regc_locale.c: * generic/tclUniData.c: * generic/tclUtf.c: * doc/string.n: Updated Unicode character tables to reflect latest Unicode 2.1 data. Also rationalized "regexp" and "string is" definitions of character classes. file: [c48c2fa19e] check-in: [9d26b83359] user: stanton branch: trunk, size: 37717

1999-06-02
20:21
* generic/tclUtf.c (Tcl_UtfNcasecmp): Fixed incorrect computation of relative ordering. [Bug: 2135] file: [5cabcf8d5a] check-in: [58260de12d] user: stanton branch: trunk, size: 37568
1999-05-22
01:20
Merged changes from scriptics-tclpro-1-3-b2 branch file: [f55a65074e] check-in: [f692388d07] user: stanton branch: trunk, size: 37532
1999-05-20
23:40
lint in comments file: [10c8f0fe13] check-in: [9c2ceb7303] user: hershey branch: trunk, size: 34755
00:03
Merged in various changes submitted by Jeff Hobbs:

* generic/tcl.decls: * generic/tclUtf.c: Added Tcl_UniCharIs* functions for control, graph, print, and punct classes.

* generic/tclUtil.c: * doc/StrMatch.3: Added Tcl_StringCaseMatch() implementation to support case-insensitive globbing.

* doc/string.n: * unix/mkLinks: * tests/string.test: * generic/tclCmdMZ.c: Added additional character class tests, added -nocase switch to "string match", changed string first/last to use offsets. file: [7706b36834] check-in: [61e23396c3] user: stanton branch: scriptics-tclpro-1-3-b2-branch, size: 37579

1999-05-06
18:46
* doc/Utf.3: * generic/tclStubInit.c: * generic/tclDecls.h: * generic/tclUtf.c: * generic/tcl.decls: Added Tcl_UtfNcmp and Tcl_UtfNcasecmp. file: [fbdeeb90bf] check-in: [d9e7608a90] user: stanton branch: trunk, size: 34798
1999-04-30
16:22
Tcl_UtfToUpper and Tcl_UtfToTitle now works on badly formed Utf strings. file: [2f760fc34e] check-in: [8cc1ac5095] user: hershey branch: trunk, size: 32513
1999-04-29
00:04
fixed part of bug 1791: Tcl_UtfToUpper and Tcl_UtfToLower now work on badly formed Utf strings. file: [5737c04290] check-in: [a9c361524c] user: hershey branch: trunk, size: 32180
1999-04-21
00:42
Deleted: Added comments for 4/19 and 4/20 check-in: [6c6fc5d7b2] user: redman branch: scriptics-tclpro-1-2-old, size: 0
1999-04-16
00:46
Added: merged tcl 8.1 branch back into the main trunk file: [3f7d7e0b3e] check-in: [f3b32fb71c] user: stanton branch: trunk, size: 30544
1999-04-02
23:44
* generic/regc_locale.c: * generic/regcustom.h: * generic/tcl.decls: * generic/tclCmdIL.c: * generic/tclCmdMZ.c: * generic/tclInt.h: * generic/tclRegexp.c: * generic/tclScan.c: * generic/tclTest.c: * generic/tclUtf.c: * win/tclWinFCmd.c: * win/tclWinFile.c: Made various Unicode utility functions public. The following functions were made public and added to the stubs table: Tcl_UtfToUniCharDString, Tcl_UniCharToUtfDString, Tcl_UniCharLen, Tcl_UniCharNcmp, Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha, Tcl_UniCharIsDigit, Tcl_UniCharIsLower, Tcl_UniCharIsSpace, Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar file: [f80e00437b] check-in: [de06484e63] user: stanton branch: core-8-1-branch-old, size: 30548
1998-11-04
04:39
Merged Henry's latest changes to add support for character ranges in cvec data type

Added support for Unicode character classes in regular expressions We now support the following character classes:

alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit

These all follow the example set by the GNU regular expression package for Java except that "digit" only matches the ASCII '0'-'9' characters.

Renamed tclUtf.h to tclUniData.c file: [e9daa1c432] check-in: [6a4b8e2ee0] user: stanton branch: core-8-1-branch-old, size: 30526

1998-10-21
20:57
fixed typo in include file: [c75a37c3d7] check-in: [078afb3c89] user: stanton branch: core-8-1-branch-old, size: 30447
1998-10-16
01:16
Added Unicode character table support: added TclUniCharIsWordChar
tclCmdMZ.c (Tcl_StringObjCmd): added "totitle" subcommand, changed "wordend" and "wordstart" to properly handle Unicode word characters and connector punctuation
file: [e7c8a7e302] check-in: [a448adffae] user: stanton branch: core-8-1-branch-old, size: 30447
1998-10-03
01:56
replaced SCCS with RCS strings file: [f4f9e2686e] check-in: [c65ae5da0d] user: stanton branch: core-8-1-branch-old, size: 30133
1998-09-21
23:39
Added: Added contents of Tcl 8.1a2 file: [988d81327c] check-in: [8c56dc8868] user: stanton branch: core-8-1-branch-old, size: 30108