Tcl Source Code

Check-in [6d59f33bbb]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merge trunk, but remove experimental tip-389-impl merge.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | androwish
Files: files | file ages | folders
SHA1: 6d59f33bbb2e30f58ed9a4a7bf232d0e465ee0f5
User & Date: jan.nijtmans 2015-08-31 10:22:29
Context
2015-08-31
17:30
Dependancies from androwish upstream check-in: bc0f534cb0 user: jan.nijtmans tags: androwish
10:22
Merge trunk, but remove experimental tip-389-impl merge. check-in: 6d59f33bbb user: jan.nijtmans tags: androwish
10:18
Some Unicode encoding fixes, only having effect if TCL_UTF_MAX > 4. Backported from androwish check-in: d8764f73dd user: jan.nijtmans tags: trunk
2015-08-24
19:39
Merge tip-389-impl (experimental) check-in: 84c4345b29 user: jan.nijtmans tags: androwish
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/library.n.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
'\"
'\" Copyright (c) 1991-1993 The Regents of the University of California.
'\" Copyright (c) 1994-1996 Sun Microsystems, Inc.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\" 
.TH library n "8.0" Tcl "Tcl Built-In Commands"
.so man.macros
.BS
.SH NAME
auto_execok, auto_import, auto_load, auto_mkindex, auto_qualify, auto_reset, tcl_findLibrary, parray, tcl_endOfWord, tcl_startOfNextWord, tcl_startOfPreviousWord, tcl_wordBreakAfter, tcl_wordBreakBefore \- standard library of Tcl procedures
.SH SYNOPSIS
.nf
\fBauto_execok \fIcmd\fR
\fBauto_import \fIpattern\fR
\fBauto_load \fIcmd\fR
\fBauto_mkindex \fIdir pattern pattern ...\fR
\fBauto_qualify \fIcommand namespace\fR
\fBauto_reset\fR
\fBtcl_findLibrary \fIbasename version patch initScript enVarName varName\fR
\fBparray \fIarrayName\fR
\fBtcl_endOfWord \fIstr start\fR
\fBtcl_startOfNextWord \fIstr start\fR
\fBtcl_startOfPreviousWord \fIstr start\fR
\fBtcl_wordBreakAfter \fIstr start\fR
\fBtcl_wordBreakBefore \fIstr start\fR
.BE
.SH INTRODUCTION






|














|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
'\"
'\" Copyright (c) 1991-1993 The Regents of the University of California.
'\" Copyright (c) 1994-1996 Sun Microsystems, Inc.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\"
.TH library n "8.0" Tcl "Tcl Built-In Commands"
.so man.macros
.BS
.SH NAME
auto_execok, auto_import, auto_load, auto_mkindex, auto_qualify, auto_reset, tcl_findLibrary, parray, tcl_endOfWord, tcl_startOfNextWord, tcl_startOfPreviousWord, tcl_wordBreakAfter, tcl_wordBreakBefore \- standard library of Tcl procedures
.SH SYNOPSIS
.nf
\fBauto_execok \fIcmd\fR
\fBauto_import \fIpattern\fR
\fBauto_load \fIcmd\fR
\fBauto_mkindex \fIdir pattern pattern ...\fR
\fBauto_qualify \fIcommand namespace\fR
\fBauto_reset\fR
\fBtcl_findLibrary \fIbasename version patch initScript enVarName varName\fR
\fBparray \fIarrayName\fR ?\fIpattern\fR?
\fBtcl_endOfWord \fIstr start\fR
\fBtcl_startOfNextWord \fIstr start\fR
\fBtcl_startOfPreviousWord \fIstr start\fR
\fBtcl_wordBreakAfter \fIstr start\fR
\fBtcl_wordBreakBefore \fIstr start\fR
.BE
.SH INTRODUCTION
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
parses the Tcl scripts in a relatively
unsophisticated way:  if any line contains the word
.QW \fBproc\fR
as its first characters then it is assumed to be a procedure
definition and the next word of the line is taken as the
procedure's name.
Procedure definitions that do not appear in this way (e.g.\ they
have spaces before the \fBproc\fR) will not be indexed.  If your 
script contains
.QW dangerous
code, such as global initialization
code or procedure names with special characters like \fB$\fR,
\fB*\fR, \fB[\fR or \fB]\fR, you are safer using \fBauto_mkindex_old\fR.
.RE
.TP







|







135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
parses the Tcl scripts in a relatively
unsophisticated way:  if any line contains the word
.QW \fBproc\fR
as its first characters then it is assumed to be a procedure
definition and the next word of the line is taken as the
procedure's name.
Procedure definitions that do not appear in this way (e.g.\ they
have spaces before the \fBproc\fR) will not be indexed.  If your
script contains
.QW dangerous
code, such as global initialization
code or procedure names with special characters like \fB$\fR,
\fB*\fR, \fB[\fR or \fB]\fR, you are safer using \fBauto_mkindex_old\fR.
.RE
.TP
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201


202
203
204
205
206
207
208
performing the actual auto-loading of functions at runtime.
.RE
.TP
\fBtcl_findLibrary \fIbasename version patch initScript enVarName varName\fR
This is a standard search procedure for use by extensions during
their initialization.  They call this procedure to look for their
script library in several standard directories.
The last component of the name of the library directory is 
normally \fIbasenameversion\fR
(e.g., tk8.0), but it might be
.QW library
when in the build hierarchies.
The \fIinitScript\fR file will be sourced into the interpreter
once it is found.  The directory in which this file is found is
stored into the global variable \fIvarName\fR.
If this variable is already defined (e.g., by C code during
application initialization) then no searching is done.
Otherwise the search looks in these directories:
the directory named by the environment variable \fIenVarName\fR;
relative to the Tcl library directory;
relative to the executable file in the standard installation
bin or bin/\fIarch\fR directory;
relative to the executable file in the current build tree;
relative to the executable file in a parallel build tree.
.TP
\fBparray \fIarrayName\fR
Prints on standard output the names and values of all the elements
in the array \fIarrayName\fR.


\fIArrayName\fR must be an array accessible to the caller of \fBparray\fR.
It may be either local or global.
.TP
\fBtcl_endOfWord \fIstr start\fR
Returns the index of the first end-of-word location that occurs after
a starting index \fIstart\fR in the string \fIstr\fR.  An end-of-word
location is defined to be the first non-word character following the







|

















|
|
|
>
>







174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
performing the actual auto-loading of functions at runtime.
.RE
.TP
\fBtcl_findLibrary \fIbasename version patch initScript enVarName varName\fR
This is a standard search procedure for use by extensions during
their initialization.  They call this procedure to look for their
script library in several standard directories.
The last component of the name of the library directory is
normally \fIbasenameversion\fR
(e.g., tk8.0), but it might be
.QW library
when in the build hierarchies.
The \fIinitScript\fR file will be sourced into the interpreter
once it is found.  The directory in which this file is found is
stored into the global variable \fIvarName\fR.
If this variable is already defined (e.g., by C code during
application initialization) then no searching is done.
Otherwise the search looks in these directories:
the directory named by the environment variable \fIenVarName\fR;
relative to the Tcl library directory;
relative to the executable file in the standard installation
bin or bin/\fIarch\fR directory;
relative to the executable file in the current build tree;
relative to the executable file in a parallel build tree.
.TP
\fBparray \fIarrayName\fR ?\fIpattern\fR?
Prints on standard output the names and values of all the elements in the
array \fIarrayName\fR, or just the names that match \fIpattern\fR (using the
matching rules of \fBstring match\fR) and their values if \fIpattern\fR is
given.
\fIArrayName\fR must be an array accessible to the caller of \fBparray\fR.
It may be either local or global.
.TP
\fBtcl_endOfWord \fIstr start\fR
Returns the index of the first end-of-word location that occurs after
a starting index \fIstart\fR in the string \fIstr\fR.  An end-of-word
location is defined to be the first non-word character following the
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
library scripts (the value of this variable will be
assigned to the \fBtcl_library\fR variable and therefore returned by
the command \fBinfo library\fR).  If this variable is not set then
a default value is used.
.TP
\fBenv(TCLLIBPATH)\fR
If set, then it must contain a valid Tcl list giving directories to
search during auto-load operations.  Directories must be specified in 
Tcl format, using
.QW /
as the path separator, regardless of platform.
This variable is only used when initializing the \fBauto_path\fR variable.
.SS "WORD BOUNDARY DETERMINATION VARIABLES"
These variables are only used in the \fBtcl_endOfWord\fR,
\fBtcl_startOfNextWord\fR, \fBtcl_startOfPreviousWord\fR,







|







281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
library scripts (the value of this variable will be
assigned to the \fBtcl_library\fR variable and therefore returned by
the command \fBinfo library\fR).  If this variable is not set then
a default value is used.
.TP
\fBenv(TCLLIBPATH)\fR
If set, then it must contain a valid Tcl list giving directories to
search during auto-load operations.  Directories must be specified in
Tcl format, using
.QW /
as the path separator, regardless of platform.
This variable is only used when initializing the \fBauto_path\fR variable.
.SS "WORD BOUNDARY DETERMINATION VARIABLES"
These variables are only used in the \fBtcl_endOfWord\fR,
\fBtcl_startOfNextWord\fR, \fBtcl_startOfPreviousWord\fR,
308
309
310
311
312
313
314
315
316
317
318
word or not.  If the pattern matches a character, the character is
considered to be a word character.  On Windows platforms, words are
comprised of any character that is not a space, tab, or newline.  Under
Unix, words are comprised of numbers, letters or underscores.
.SH "SEE ALSO"
env(n), info(n), re_syntax(n)
.SH KEYWORDS
auto-exec, auto-load, library, unknown, word, whitespace 
'\"Local Variables:
'\"mode: nroff
'\"End:







|



310
311
312
313
314
315
316
317
318
319
320
word or not.  If the pattern matches a character, the character is
considered to be a word character.  On Windows platforms, words are
comprised of any character that is not a space, tab, or newline.  Under
Unix, words are comprised of numbers, letters or underscores.
.SH "SEE ALSO"
env(n), info(n), re_syntax(n)
.SH KEYWORDS
auto-exec, auto-load, library, unknown, word, whitespace
'\"Local Variables:
'\"mode: nroff
'\"End:

Changes to generic/tcl.decls.

1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
    void Tcl_ThreadAlert(Tcl_ThreadId threadId)
}
declare 319 {
    void Tcl_ThreadQueueEvent(Tcl_ThreadId threadId, Tcl_Event *evPtr,
	    Tcl_QueuePosition position)
}
declare 320 {
    int Tcl_UniCharAtIndex(const char *src, int index)
}
declare 321 {
    int Tcl_UniCharToLower(int ch)
}
declare 322 {
    int Tcl_UniCharToTitle(int ch)
}
declare 323 {
    int Tcl_UniCharToUpper(int ch)
}
declare 324 {
    int Tcl_UniCharToUtf(int ch, char *buf)
}
declare 325 {
    CONST84_RETURN char *Tcl_UtfAtIndex(const char *src, int index)
}







|


|


|


|







1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
    void Tcl_ThreadAlert(Tcl_ThreadId threadId)
}
declare 319 {
    void Tcl_ThreadQueueEvent(Tcl_ThreadId threadId, Tcl_Event *evPtr,
	    Tcl_QueuePosition position)
}
declare 320 {
    Tcl_UniChar Tcl_UniCharAtIndex(const char *src, int index)
}
declare 321 {
    Tcl_UniChar Tcl_UniCharToLower(int ch)
}
declare 322 {
    Tcl_UniChar Tcl_UniCharToTitle(int ch)
}
declare 323 {
    Tcl_UniChar Tcl_UniCharToUpper(int ch)
}
declare 324 {
    int Tcl_UniCharToUtf(int ch, char *buf)
}
declare 325 {
    CONST84_RETURN char *Tcl_UtfAtIndex(const char *src, int index)
}
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
    void Tcl_SetUnicodeObj(Tcl_Obj *objPtr, const Tcl_UniChar *unicode,
	    int numChars)
}
declare 380 {
    int Tcl_GetCharLength(Tcl_Obj *objPtr)
}
declare 381 {
    int Tcl_GetUniChar(Tcl_Obj *objPtr, int index)
}
declare 382 {
    Tcl_UniChar *Tcl_GetUnicode(Tcl_Obj *objPtr)
}
declare 383 {
    Tcl_Obj *Tcl_GetRange(Tcl_Obj *objPtr, int first, int last)
}







|







1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
    void Tcl_SetUnicodeObj(Tcl_Obj *objPtr, const Tcl_UniChar *unicode,
	    int numChars)
}
declare 380 {
    int Tcl_GetCharLength(Tcl_Obj *objPtr)
}
declare 381 {
    Tcl_UniChar Tcl_GetUniChar(Tcl_Obj *objPtr, int index)
}
declare 382 {
    Tcl_UniChar *Tcl_GetUnicode(Tcl_Obj *objPtr)
}
declare 383 {
    Tcl_Obj *Tcl_GetRange(Tcl_Obj *objPtr, int first, int last)
}

Changes to generic/tcl.h.

2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
 * 4, then Tcl_UniChar must be 2-bytes in size (UCS-2) (the default). If 6,
 * then Tcl_UniChar must be 4-bytes in size (UCS-4). At this time UCS-2 mode
 * is the default and recommended mode. UCS-4 is experimental and not
 * recommended. It works for the core, but most extensions expect UCS-2.
 */

#ifndef TCL_UTF_MAX
#define TCL_UTF_MAX		4
#endif

/*
 * This represents a Unicode character. Any changes to this should also be
 * reflected in regcustom.h.
 */








|







2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
 * 4, then Tcl_UniChar must be 2-bytes in size (UCS-2) (the default). If 6,
 * then Tcl_UniChar must be 4-bytes in size (UCS-4). At this time UCS-2 mode
 * is the default and recommended mode. UCS-4 is experimental and not
 * recommended. It works for the core, but most extensions expect UCS-2.
 */

#ifndef TCL_UTF_MAX
#define TCL_UTF_MAX		3
#endif

/*
 * This represents a Unicode character. Any changes to this should also be
 * reflected in regcustom.h.
 */

Changes to generic/tclBinary.c.

450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
    Tcl_Interp *interp,		/* Not used. */
    Tcl_Obj *objPtr)		/* The object to convert to type ByteArray. */
{
    int length;
    const char *src, *srcEnd;
    unsigned char *dst;
    ByteArray *byteArrayPtr;
    Tcl_UniChar ch = 0;

    if (objPtr->typePtr != &tclByteArrayType) {
	src = TclGetStringFromObj(objPtr, &length);
	srcEnd = src + length;

	byteArrayPtr = ckalloc(BYTEARRAY_SIZE(length));
	for (dst = byteArrayPtr->bytes; src < srcEnd; ) {
	    int n = Tcl_UtfToUniChar(src, &ch);
	    if (n) {
		src += n;
		*dst++ = UCHAR(ch);
	    }
	}

	byteArrayPtr->used = dst - byteArrayPtr->bytes;
	byteArrayPtr->allocated = length;

	TclFreeIntRep(objPtr);
	objPtr->typePtr = &tclByteArrayType;







|







|
<
<
|
<







450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465


466

467
468
469
470
471
472
473
    Tcl_Interp *interp,		/* Not used. */
    Tcl_Obj *objPtr)		/* The object to convert to type ByteArray. */
{
    int length;
    const char *src, *srcEnd;
    unsigned char *dst;
    ByteArray *byteArrayPtr;
    Tcl_UniChar ch;

    if (objPtr->typePtr != &tclByteArrayType) {
	src = TclGetStringFromObj(objPtr, &length);
	srcEnd = src + length;

	byteArrayPtr = ckalloc(BYTEARRAY_SIZE(length));
	for (dst = byteArrayPtr->bytes; src < srcEnd; ) {
	    src += Tcl_UtfToUniChar(src, &ch);


	    *dst++ = UCHAR(ch);

	}

	byteArrayPtr->used = dst - byteArrayPtr->bytes;
	byteArrayPtr->allocated = length;

	TclFreeIntRep(objPtr);
	objPtr->typePtr = &tclByteArrayType;
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223

 badIndex:
    errorString = "not enough arguments for all format specifiers";
    goto error;

 badField:
    {
	Tcl_UniChar ch = 0;
	char buf[TCL_UTF_MAX + 1];

	Tcl_UtfToUniChar(errorString, &ch);
	buf[Tcl_UniCharToUtf(ch, buf)] = '\0';
	Tcl_SetObjResult(interp, Tcl_ObjPrintf(
		"bad field specifier \"%s\"", buf));
	return TCL_ERROR;







|







1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220

 badIndex:
    errorString = "not enough arguments for all format specifiers";
    goto error;

 badField:
    {
	Tcl_UniChar ch;
	char buf[TCL_UTF_MAX + 1];

	Tcl_UtfToUniChar(errorString, &ch);
	buf[Tcl_UniCharToUtf(ch, buf)] = '\0';
	Tcl_SetObjResult(interp, Tcl_ObjPrintf(
		"bad field specifier \"%s\"", buf));
	return TCL_ERROR;
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593

 badIndex:
    errorString = "not enough arguments for all format specifiers";
    goto error;

 badField:
    {
	Tcl_UniChar ch = 0;
	char buf[TCL_UTF_MAX + 1];

	Tcl_UtfToUniChar(errorString, &ch);
	buf[Tcl_UniCharToUtf(ch, buf)] = '\0';
	Tcl_SetObjResult(interp, Tcl_ObjPrintf(
		"bad field specifier \"%s\"", buf));
	return TCL_ERROR;







|







1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590

 badIndex:
    errorString = "not enough arguments for all format specifiers";
    goto error;

 badField:
    {
	Tcl_UniChar ch;
	char buf[TCL_UTF_MAX + 1];

	Tcl_UtfToUniChar(errorString, &ch);
	buf[Tcl_UniCharToUtf(ch, buf)] = '\0';
	Tcl_SetObjResult(interp, Tcl_ObjPrintf(
		"bad field specifier \"%s\"", buf));
	return TCL_ERROR;

Changes to generic/tclCmdMZ.c.

309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
	 * start of the string unless the previous character is a newline.
	 */

	if (offset == 0) {
	    eflags = 0;
	} else if (offset > stringLength) {
	    eflags = TCL_REG_NOTBOL;
	} else if (Tcl_GetUniChar(objPtr, offset-1) == '\n') {
	    eflags = 0;
	} else {
	    eflags = TCL_REG_NOTBOL;
	}

	match = Tcl_RegExpExecObj(interp, regExpr, objPtr, offset,
		numMatchesSaved, eflags);







|







309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
	 * start of the string unless the previous character is a newline.
	 */

	if (offset == 0) {
	    eflags = 0;
	} else if (offset > stringLength) {
	    eflags = TCL_REG_NOTBOL;
	} else if (Tcl_GetUniChar(objPtr, offset-1) == (Tcl_UniChar)'\n') {
	    eflags = 0;
	} else {
	    eflags = TCL_REG_NOTBOL;
	}

	match = Tcl_RegExpExecObj(interp, regExpr, objPtr, offset,
		numMatchesSaved, eflags);
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    int idx, result, cflags, all, wlen, wsublen, numMatches, offset;
    int start, end, subStart, subEnd, match;
    Tcl_RegExp regExpr;
    Tcl_RegExpInfo info;
    Tcl_Obj *resultPtr, *subPtr, *objPtr, *startIndex = NULL;
    Tcl_UniChar ch = 0, *wsrc, *wfirstChar, *wstring, *wsubspec, *wend;

    static const char *const options[] = {
	"-all",		"-nocase",	"-expanded",
	"-line",	"-linestop",	"-lineanchor",	"-start",
	"--",		NULL
    };
    enum options {







|







491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    int idx, result, cflags, all, wlen, wsublen, numMatches, offset;
    int start, end, subStart, subEnd, match;
    Tcl_RegExp regExpr;
    Tcl_RegExpInfo info;
    Tcl_Obj *resultPtr, *subPtr, *objPtr, *startIndex = NULL;
    Tcl_UniChar ch, *wsrc, *wfirstChar, *wstring, *wsubspec, *wend;

    static const char *const options[] = {
	"-all",		"-nocase",	"-expanded",
	"-line",	"-linestop",	"-lineanchor",	"-start",
	"--",		NULL
    };
    enum options {
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
int
Tcl_SplitObjCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_UniChar ch = 0;
    int len;
    const char *splitChars;
    const char *stringPtr;
    const char *end;
    int splitCharLen, stringLen;
    Tcl_Obj *listPtr, *objPtr;








|







1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
int
Tcl_SplitObjCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_UniChar ch;
    int len;
    const char *splitChars;
    const char *stringPtr;
    const char *end;
    int splitCharLen, stringLen;
    Tcl_Obj *listPtr, *objPtr;

1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
	 * megabyte range!) - DKF
	 */

	Tcl_InitHashTable(&charReuseTable, TCL_ONE_WORD_KEYS);

	for ( ; stringPtr < end; stringPtr += len) {
	    len = TclUtfToUniChar(stringPtr, &ch);
	    
	    if (!len) {
	        continue;
	    }

	    /*
	     * Assume Tcl_UniChar is an integral type...
	     */

	    hPtr = Tcl_CreateHashEntry(&charReuseTable, INT2PTR((int) ch),
		    &isNew);







<
<
<
<







1081
1082
1083
1084
1085
1086
1087




1088
1089
1090
1091
1092
1093
1094
	 * megabyte range!) - DKF
	 */

	Tcl_InitHashTable(&charReuseTable, TCL_ONE_WORD_KEYS);

	for ( ; stringPtr < end; stringPtr += len) {
	    len = TclUtfToUniChar(stringPtr, &ch);





	    /*
	     * Assume Tcl_UniChar is an integral type...
	     */

	    hPtr = Tcl_CreateHashEntry(&charReuseTable, INT2PTR((int) ch),
		    &isNew);
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
	    stringPtr = p + 1;
	}
	TclNewStringObj(objPtr, stringPtr, end - stringPtr);
	Tcl_ListObjAppendElement(NULL, listPtr, objPtr);
    } else {
	const char *element, *p, *splitEnd;
	int splitLen;
	Tcl_UniChar splitChar = 0;

	/*
	 * Normal case: split on any of a given set of characters. Discard
	 * instances of the split characters.
	 */

	splitEnd = splitChars + splitCharLen;







|







1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
	    stringPtr = p + 1;
	}
	TclNewStringObj(objPtr, stringPtr, end - stringPtr);
	Tcl_ListObjAppendElement(NULL, listPtr, objPtr);
    } else {
	const char *element, *p, *splitEnd;
	int splitLen;
	Tcl_UniChar splitChar;

	/*
	 * Normal case: split on any of a given set of characters. Discard
	 * instances of the split characters.
	 */

	splitEnd = splitChars + splitCharLen;
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
StringIsCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    const char *string1, *end, *stop;
    Tcl_UniChar ch = 0;
    int (*chcomp)(int) = NULL;	/* The UniChar comparison function. */
    int i, failat = 0, result = 1, strict = 0, index, length1, length2;
    Tcl_Obj *objPtr, *failVarObj = NULL;
    Tcl_WideInt w;

    static const char *const isClasses[] = {
	"alnum",	"alpha",	"ascii",	"control",







|







1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
StringIsCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    const char *string1, *end, *stop;
    Tcl_UniChar ch;
    int (*chcomp)(int) = NULL;	/* The UniChar comparison function. */
    int i, failat = 0, result = 1, strict = 0, index, length1, length2;
    Tcl_Obj *objPtr, *failVarObj = NULL;
    Tcl_WideInt w;

    static const char *const isClasses[] = {
	"alnum",	"alpha",	"ascii",	"control",
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
	    if (strict) {
		result = 0;
	    }
	    goto str_is_done;
	}
	end = string1 + length1;
	for (; string1 < end; string1 += length2, failat++) {
	    int fullchar;
	    length2 = TclUtfToUniChar(string1, &ch);
	    fullchar = ch;
	    if (!length2) {
	    	length2 = TclUtfToUniChar(string1, &ch);
	    	fullchar = (((fullchar & 0x3ff) << 10) | (ch & 0x3ff)) + 0x10000;
	    }
	    if (!chcomp(fullchar)) {
		result = 0;
		break;
	    }
	}
    }

    /*







<

<
<
<
<
<
|







1783
1784
1785
1786
1787
1788
1789

1790





1791
1792
1793
1794
1795
1796
1797
1798
	    if (strict) {
		result = 0;
	    }
	    goto str_is_done;
	}
	end = string1 + length1;
	for (; string1 < end; string1 += length2, failat++) {

	    length2 = TclUtfToUniChar(string1, &ch);





	    if (!chcomp(ch)) {
		result = 0;
		break;
	    }
	}
    }

    /*
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
static int
StringStartCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_UniChar ch = 0;
    const char *p, *string;
    int cur, index, length, numChars;

    if (objc != 3) {
	Tcl_WrongNumArgs(interp, 1, objv, "string index");
	return TCL_ERROR;
    }







|







2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
static int
StringStartCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_UniChar ch;
    const char *p, *string;
    int cur, index, length, numChars;

    if (objc != 3) {
	Tcl_WrongNumArgs(interp, 1, objv, "string index");
	return TCL_ERROR;
    }
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
static int
StringEndCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_UniChar ch = 0;
    const char *p, *end, *string;
    int cur, index, length, numChars;

    if (objc != 3) {
	Tcl_WrongNumArgs(interp, 1, objv, "string index");
	return TCL_ERROR;
    }







|







2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
static int
StringEndCmd(
    ClientData dummy,		/* Not used. */
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_UniChar ch;
    const char *p, *end, *string;
    int cur, index, length, numChars;

    if (objc != 3) {
	Tcl_WrongNumArgs(interp, 1, objv, "string index");
	return TCL_ERROR;
    }

Changes to generic/tclDecls.h.

945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
				int flags);
/* 318 */
EXTERN void		Tcl_ThreadAlert(Tcl_ThreadId threadId);
/* 319 */
EXTERN void		Tcl_ThreadQueueEvent(Tcl_ThreadId threadId,
				Tcl_Event *evPtr, Tcl_QueuePosition position);
/* 320 */
EXTERN int		Tcl_UniCharAtIndex(const char *src, int index);
/* 321 */
EXTERN int		Tcl_UniCharToLower(int ch);
/* 322 */
EXTERN int		Tcl_UniCharToTitle(int ch);
/* 323 */
EXTERN int		Tcl_UniCharToUpper(int ch);
/* 324 */
EXTERN int		Tcl_UniCharToUtf(int ch, char *buf);
/* 325 */
EXTERN CONST84_RETURN char * Tcl_UtfAtIndex(const char *src, int index);
/* 326 */
EXTERN int		Tcl_UtfCharComplete(const char *src, int length);
/* 327 */







|

|

|

|







945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
				int flags);
/* 318 */
EXTERN void		Tcl_ThreadAlert(Tcl_ThreadId threadId);
/* 319 */
EXTERN void		Tcl_ThreadQueueEvent(Tcl_ThreadId threadId,
				Tcl_Event *evPtr, Tcl_QueuePosition position);
/* 320 */
EXTERN Tcl_UniChar	Tcl_UniCharAtIndex(const char *src, int index);
/* 321 */
EXTERN Tcl_UniChar	Tcl_UniCharToLower(int ch);
/* 322 */
EXTERN Tcl_UniChar	Tcl_UniCharToTitle(int ch);
/* 323 */
EXTERN Tcl_UniChar	Tcl_UniCharToUpper(int ch);
/* 324 */
EXTERN int		Tcl_UniCharToUtf(int ch, char *buf);
/* 325 */
EXTERN CONST84_RETURN char * Tcl_UtfAtIndex(const char *src, int index);
/* 326 */
EXTERN int		Tcl_UtfCharComplete(const char *src, int length);
/* 327 */
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
				int numChars);
/* 379 */
EXTERN void		Tcl_SetUnicodeObj(Tcl_Obj *objPtr,
				const Tcl_UniChar *unicode, int numChars);
/* 380 */
EXTERN int		Tcl_GetCharLength(Tcl_Obj *objPtr);
/* 381 */
EXTERN int		Tcl_GetUniChar(Tcl_Obj *objPtr, int index);
/* 382 */
EXTERN Tcl_UniChar *	Tcl_GetUnicode(Tcl_Obj *objPtr);
/* 383 */
EXTERN Tcl_Obj *	Tcl_GetRange(Tcl_Obj *objPtr, int first, int last);
/* 384 */
EXTERN void		Tcl_AppendUnicodeToObj(Tcl_Obj *objPtr,
				const Tcl_UniChar *unicode, int length);







|







1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
				int numChars);
/* 379 */
EXTERN void		Tcl_SetUnicodeObj(Tcl_Obj *objPtr,
				const Tcl_UniChar *unicode, int numChars);
/* 380 */
EXTERN int		Tcl_GetCharLength(Tcl_Obj *objPtr);
/* 381 */
EXTERN Tcl_UniChar	Tcl_GetUniChar(Tcl_Obj *objPtr, int index);
/* 382 */
EXTERN Tcl_UniChar *	Tcl_GetUnicode(Tcl_Obj *objPtr);
/* 383 */
EXTERN Tcl_Obj *	Tcl_GetRange(Tcl_Obj *objPtr, int first, int last);
/* 384 */
EXTERN void		Tcl_AppendUnicodeToObj(Tcl_Obj *objPtr,
				const Tcl_UniChar *unicode, int length);
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
    int (*tcl_ReadChars) (Tcl_Channel channel, Tcl_Obj *objPtr, int charsToRead, int appendFlag); /* 313 */
    void (*tcl_RestoreResult) (Tcl_Interp *interp, Tcl_SavedResult *statePtr); /* 314 */
    void (*tcl_SaveResult) (Tcl_Interp *interp, Tcl_SavedResult *statePtr); /* 315 */
    int (*tcl_SetSystemEncoding) (Tcl_Interp *interp, const char *name); /* 316 */
    Tcl_Obj * (*tcl_SetVar2Ex) (Tcl_Interp *interp, const char *part1, const char *part2, Tcl_Obj *newValuePtr, int flags); /* 317 */
    void (*tcl_ThreadAlert) (Tcl_ThreadId threadId); /* 318 */
    void (*tcl_ThreadQueueEvent) (Tcl_ThreadId threadId, Tcl_Event *evPtr, Tcl_QueuePosition position); /* 319 */
    int (*tcl_UniCharAtIndex) (const char *src, int index); /* 320 */
    int (*tcl_UniCharToLower) (int ch); /* 321 */
    int (*tcl_UniCharToTitle) (int ch); /* 322 */
    int (*tcl_UniCharToUpper) (int ch); /* 323 */
    int (*tcl_UniCharToUtf) (int ch, char *buf); /* 324 */
    CONST84_RETURN char * (*tcl_UtfAtIndex) (const char *src, int index); /* 325 */
    int (*tcl_UtfCharComplete) (const char *src, int length); /* 326 */
    int (*tcl_UtfBackslash) (const char *src, int *readPtr, char *dst); /* 327 */
    CONST84_RETURN char * (*tcl_UtfFindFirst) (const char *src, int ch); /* 328 */
    CONST84_RETURN char * (*tcl_UtfFindLast) (const char *src, int ch); /* 329 */
    CONST84_RETURN char * (*tcl_UtfNext) (const char *src); /* 330 */







|
|
|
|







2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
    int (*tcl_ReadChars) (Tcl_Channel channel, Tcl_Obj *objPtr, int charsToRead, int appendFlag); /* 313 */
    void (*tcl_RestoreResult) (Tcl_Interp *interp, Tcl_SavedResult *statePtr); /* 314 */
    void (*tcl_SaveResult) (Tcl_Interp *interp, Tcl_SavedResult *statePtr); /* 315 */
    int (*tcl_SetSystemEncoding) (Tcl_Interp *interp, const char *name); /* 316 */
    Tcl_Obj * (*tcl_SetVar2Ex) (Tcl_Interp *interp, const char *part1, const char *part2, Tcl_Obj *newValuePtr, int flags); /* 317 */
    void (*tcl_ThreadAlert) (Tcl_ThreadId threadId); /* 318 */
    void (*tcl_ThreadQueueEvent) (Tcl_ThreadId threadId, Tcl_Event *evPtr, Tcl_QueuePosition position); /* 319 */
    Tcl_UniChar (*tcl_UniCharAtIndex) (const char *src, int index); /* 320 */
    Tcl_UniChar (*tcl_UniCharToLower) (int ch); /* 321 */
    Tcl_UniChar (*tcl_UniCharToTitle) (int ch); /* 322 */
    Tcl_UniChar (*tcl_UniCharToUpper) (int ch); /* 323 */
    int (*tcl_UniCharToUtf) (int ch, char *buf); /* 324 */
    CONST84_RETURN char * (*tcl_UtfAtIndex) (const char *src, int index); /* 325 */
    int (*tcl_UtfCharComplete) (const char *src, int length); /* 326 */
    int (*tcl_UtfBackslash) (const char *src, int *readPtr, char *dst); /* 327 */
    CONST84_RETURN char * (*tcl_UtfFindFirst) (const char *src, int ch); /* 328 */
    CONST84_RETURN char * (*tcl_UtfFindLast) (const char *src, int ch); /* 329 */
    CONST84_RETURN char * (*tcl_UtfNext) (const char *src); /* 330 */
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
    int (*tcl_UniCharIsPrint) (int ch); /* 374 */
    int (*tcl_UniCharIsPunct) (int ch); /* 375 */
    int (*tcl_RegExpExecObj) (Tcl_Interp *interp, Tcl_RegExp regexp, Tcl_Obj *textObj, int offset, int nmatches, int flags); /* 376 */
    void (*tcl_RegExpGetInfo) (Tcl_RegExp regexp, Tcl_RegExpInfo *infoPtr); /* 377 */
    Tcl_Obj * (*tcl_NewUnicodeObj) (const Tcl_UniChar *unicode, int numChars); /* 378 */
    void (*tcl_SetUnicodeObj) (Tcl_Obj *objPtr, const Tcl_UniChar *unicode, int numChars); /* 379 */
    int (*tcl_GetCharLength) (Tcl_Obj *objPtr); /* 380 */
    int (*tcl_GetUniChar) (Tcl_Obj *objPtr, int index); /* 381 */
    Tcl_UniChar * (*tcl_GetUnicode) (Tcl_Obj *objPtr); /* 382 */
    Tcl_Obj * (*tcl_GetRange) (Tcl_Obj *objPtr, int first, int last); /* 383 */
    void (*tcl_AppendUnicodeToObj) (Tcl_Obj *objPtr, const Tcl_UniChar *unicode, int length); /* 384 */
    int (*tcl_RegExpMatchObj) (Tcl_Interp *interp, Tcl_Obj *textObj, Tcl_Obj *patternObj); /* 385 */
    void (*tcl_SetNotifier) (Tcl_NotifierProcs *notifierProcPtr); /* 386 */
    Tcl_Mutex * (*tcl_GetAllocMutex) (void); /* 387 */
    int (*tcl_GetChannelNames) (Tcl_Interp *interp); /* 388 */







|







2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
    int (*tcl_UniCharIsPrint) (int ch); /* 374 */
    int (*tcl_UniCharIsPunct) (int ch); /* 375 */
    int (*tcl_RegExpExecObj) (Tcl_Interp *interp, Tcl_RegExp regexp, Tcl_Obj *textObj, int offset, int nmatches, int flags); /* 376 */
    void (*tcl_RegExpGetInfo) (Tcl_RegExp regexp, Tcl_RegExpInfo *infoPtr); /* 377 */
    Tcl_Obj * (*tcl_NewUnicodeObj) (const Tcl_UniChar *unicode, int numChars); /* 378 */
    void (*tcl_SetUnicodeObj) (Tcl_Obj *objPtr, const Tcl_UniChar *unicode, int numChars); /* 379 */
    int (*tcl_GetCharLength) (Tcl_Obj *objPtr); /* 380 */
    Tcl_UniChar (*tcl_GetUniChar) (Tcl_Obj *objPtr, int index); /* 381 */
    Tcl_UniChar * (*tcl_GetUnicode) (Tcl_Obj *objPtr); /* 382 */
    Tcl_Obj * (*tcl_GetRange) (Tcl_Obj *objPtr, int first, int last); /* 383 */
    void (*tcl_AppendUnicodeToObj) (Tcl_Obj *objPtr, const Tcl_UniChar *unicode, int length); /* 384 */
    int (*tcl_RegExpMatchObj) (Tcl_Interp *interp, Tcl_Obj *textObj, Tcl_Obj *patternObj); /* 385 */
    void (*tcl_SetNotifier) (Tcl_NotifierProcs *notifierProcPtr); /* 386 */
    Tcl_Mutex * (*tcl_GetAllocMutex) (void); /* 387 */
    int (*tcl_GetChannelNames) (Tcl_Interp *interp); /* 388 */

Changes to generic/tclDisassemble.c.

790
791
792
793
794
795
796

797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
PrintSourceToObj(
    Tcl_Obj *appendObj,		/* The object to print the source to. */
    const char *stringPtr,	/* The string to print. */
    int maxChars)		/* Maximum number of chars to print. */
{
    register const char *p;
    register int i = 0, len;


    if (stringPtr == NULL) {
	Tcl_AppendToObj(appendObj, "\"\"", -1);
	return;
    }

    Tcl_AppendToObj(appendObj, "\"", -1);
    p = stringPtr;
    for (;  (*p != '\0') && (i < maxChars);  p+=len) {
	Tcl_UniChar ch;

	len = TclUtfToUniChar(p, &ch);
	switch (ch) {
	case '"':
	    Tcl_AppendToObj(appendObj, "\\\"", -1);
	    i += 2;
	    continue;







>









<







790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806

807
808
809
810
811
812
813
PrintSourceToObj(
    Tcl_Obj *appendObj,		/* The object to print the source to. */
    const char *stringPtr,	/* The string to print. */
    int maxChars)		/* Maximum number of chars to print. */
{
    register const char *p;
    register int i = 0, len;
    Tcl_UniChar ch = 0;

    if (stringPtr == NULL) {
	Tcl_AppendToObj(appendObj, "\"\"", -1);
	return;
    }

    Tcl_AppendToObj(appendObj, "\"", -1);
    p = stringPtr;
    for (;  (*p != '\0') && (i < maxChars);  p+=len) {


	len = TclUtfToUniChar(p, &ch);
	switch (ch) {
	case '"':
	    Tcl_AppendToObj(appendObj, "\\\"", -1);
	    i += 2;
	    continue;
828
829
830
831
832
833
834






835
836
837
838
839
840
841
	    i += 2;
	    continue;
	case '\v':
	    Tcl_AppendToObj(appendObj, "\\v", -1);
	    i += 2;
	    continue;
	default:






	    if (ch < 0x20 || ch >= 0x7f) {
		Tcl_AppendPrintfToObj(appendObj, "\\u%04x", ch);
		i += 6;
	    } else {
		Tcl_AppendPrintfToObj(appendObj, "%c", ch);
		i++;
	    }







>
>
>
>
>
>







828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
	    i += 2;
	    continue;
	case '\v':
	    Tcl_AppendToObj(appendObj, "\\v", -1);
	    i += 2;
	    continue;
	default:
#if TCL_UTF_MAX > 4
	    if ((int) ch > 0xffff) {
		Tcl_AppendPrintfToObj(appendObj, "\\U%08x", (int) ch);
		i += 10;
	    } else
#endif
	    if (ch < 0x20 || ch >= 0x7f) {
		Tcl_AppendPrintfToObj(appendObj, "\\u%04x", ch);
		i += 6;
	    } else {
		Tcl_AppendPrintfToObj(appendObj, "%c", ch);
		i++;
	    }

Changes to generic/tclEncoding.c.

2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
    int pureNullMode)		/* Convert embedded nulls from internal
				 * representation to real null-bytes or vice
				 * versa. */
{
    const char *srcStart, *srcEnd, *srcClose;
    const char *dstStart, *dstEnd;
    int result, numChars, charLimit = INT_MAX;
    Tcl_UniChar *chPtr = (Tcl_UniChar *) statePtr;

    if (flags & TCL_ENCODING_START) {
    	*chPtr = 0;
    }
    result = TCL_OK;

    srcStart = src;
    srcEnd = src + srcLen;
    srcClose = srcEnd;
    if ((flags & TCL_ENCODING_END) == 0) {
	srcClose -= TCL_UTF_MAX;







|

<
<
<







2291
2292
2293
2294
2295
2296
2297
2298
2299



2300
2301
2302
2303
2304
2305
2306
    int pureNullMode)		/* Convert embedded nulls from internal
				 * representation to real null-bytes or vice
				 * versa. */
{
    const char *srcStart, *srcEnd, *srcClose;
    const char *dstStart, *dstEnd;
    int result, numChars, charLimit = INT_MAX;
    Tcl_UniChar ch;




    result = TCL_OK;

    srcStart = src;
    srcEnd = src + srcLen;
    srcClose = srcEnd;
    if ((flags & TCL_ENCODING_END) == 0) {
	srcClose -= TCL_UTF_MAX;
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
	} else if (!Tcl_UtfCharComplete(src, srcEnd - src)) {
	    /*
	     * Always check before using Tcl_UtfToUniChar. Not doing can so
	     * cause it run beyond the endof the buffer! If we happen such an
	     * incomplete char its byts are made to represent themselves.
	     */

		*chPtr = (unsigned char) *src;
	    src += 1;
	    dst += Tcl_UniCharToUtf(*chPtr, dst);
	} else {
	    int n = Tcl_UtfToUniChar(src, chPtr);
	    src += n;
	    if (!n) numChars--;
	    dst += Tcl_UniCharToUtf(*chPtr, dst);
	}
    }

    *srcReadPtr = src - srcStart;
    *dstWrotePtr = dst - dstStart;
    *dstCharsPtr = numChars;
    return result;







|

|

|
<
<
|







2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355


2356
2357
2358
2359
2360
2361
2362
2363
	} else if (!Tcl_UtfCharComplete(src, srcEnd - src)) {
	    /*
	     * Always check before using Tcl_UtfToUniChar. Not doing can so
	     * cause it run beyond the endof the buffer! If we happen such an
	     * incomplete char its byts are made to represent themselves.
	     */

	    ch = (unsigned char) *src;
	    src += 1;
	    dst += Tcl_UniCharToUtf(ch, dst);
	} else {
	    src += Tcl_UtfToUniChar(src, &ch);


	    dst += Tcl_UniCharToUtf(ch, dst);
	}
    }

    *srcReadPtr = src - srcStart;
    *dstWrotePtr = dst - dstStart;
    *dstCharsPtr = numChars;
    return result;
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd;
    const char *dstEnd, *dstStart;
    int result, numChars, charLimit = INT_MAX;
    Tcl_UniChar *chPtr = (Tcl_UniChar *) statePtr;

    if (flags & TCL_ENCODING_START) {
    	*chPtr = 0;
    }
    if (flags & TCL_ENCODING_CHAR_LIMIT) {
	charLimit = *dstCharsPtr;
    }
    result = TCL_OK;
    if ((srcLen % sizeof(Tcl_UniChar)) != 0) {
	result = TCL_CONVERT_MULTIBYTE;
	srcLen /= sizeof(Tcl_UniChar);







|

<
<
<







2405
2406
2407
2408
2409
2410
2411
2412
2413



2414
2415
2416
2417
2418
2419
2420
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd;
    const char *dstEnd, *dstStart;
    int result, numChars, charLimit = INT_MAX;
    Tcl_UniChar ch;




    if (flags & TCL_ENCODING_CHAR_LIMIT) {
	charLimit = *dstCharsPtr;
    }
    result = TCL_OK;
    if ((srcLen % sizeof(Tcl_UniChar)) != 0) {
	result = TCL_CONVERT_MULTIBYTE;
	srcLen /= sizeof(Tcl_UniChar);
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
	}

	/*
	 * Special case for 1-byte utf chars for speed. Make sure we work with
	 * Tcl_UniChar-size data.
	 */

	*chPtr = *(Tcl_UniChar *)src;
	if (*chPtr && *chPtr < 0x80) {
	    *dst++ = (*chPtr & 0xFF);
	} else {
	    int n = Tcl_UniCharToUtf(*chPtr, dst);
	    dst += n;
	    if (!n) --numChars;/* Don't count high surrogates */
	}
	src += sizeof(Tcl_UniChar);
    }

    *srcReadPtr = src - srcStart;
    *dstWrotePtr = dst - dstStart;
    *dstCharsPtr = numChars;







|
|
|

|
<
<







2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445


2446
2447
2448
2449
2450
2451
2452
	}

	/*
	 * Special case for 1-byte utf chars for speed. Make sure we work with
	 * Tcl_UniChar-size data.
	 */

	ch = *(Tcl_UniChar *)src;
	if (ch && ch < 0x80) {
	    *dst++ = (ch & 0xFF);
	} else {
	    dst += Tcl_UniCharToUtf(ch, dst);


	}
	src += sizeof(Tcl_UniChar);
    }

    *srcReadPtr = src - srcStart;
    *dstWrotePtr = dst - dstStart;
    *dstCharsPtr = numChars;
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
				 * the conversion. */
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd, *srcClose, *dstStart, *dstEnd;
    int result, numChars;
    Tcl_UniChar *chPtr = (Tcl_UniChar *) statePtr;

    if (flags & TCL_ENCODING_START) {
    	*chPtr = 0;
    }
    srcStart = src;
    srcEnd = src + srcLen;
    srcClose = srcEnd;
    if ((flags & TCL_ENCODING_END) == 0) {
	srcClose -= TCL_UTF_MAX;
    }








|

<
<
<







2495
2496
2497
2498
2499
2500
2501
2502
2503



2504
2505
2506
2507
2508
2509
2510
				 * the conversion. */
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd, *srcClose, *dstStart, *dstEnd;
    int result, numChars;
    Tcl_UniChar ch;




    srcStart = src;
    srcEnd = src + srcLen;
    srcClose = srcEnd;
    if ((flags & TCL_ENCODING_END) == 0) {
	srcClose -= TCL_UTF_MAX;
    }

2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550

2551




2552


2553




2554


2555

2556
2557
2558
2559
2560
2561
2562

	    result = TCL_CONVERT_MULTIBYTE;
	    break;
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
        }
	src += TclUtfToUniChar(src, chPtr);

	/*
	 * Need to handle this in a way that won't cause misalignment by
	 * casting dst to a Tcl_UniChar. [Bug 1122671]
	 * XXX: This hard-codes the assumed size of Tcl_UniChar as 2.
	 */

#ifdef WORDS_BIGENDIAN

	*dst++ = (*chPtr >> 8);




	*dst++ = (*chPtr & 0xFF);


#else




	*dst++ = (*chPtr & 0xFF);


	*dst++ = (*chPtr >> 8);

#endif
    }
    *srcReadPtr = src - srcStart;
    *dstWrotePtr = dst - dstStart;
    *dstCharsPtr = numChars;
    return result;
}







|
|




<



>
|
>
>
>
>
|
>
>

>
>
>
>
|
>
>
|
>







2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533

2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562

	    result = TCL_CONVERT_MULTIBYTE;
	    break;
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	src += TclUtfToUniChar(src, &ch);

	/*
	 * Need to handle this in a way that won't cause misalignment by
	 * casting dst to a Tcl_UniChar. [Bug 1122671]

	 */

#ifdef WORDS_BIGENDIAN
#if TCL_UTF_MAX > 4
	*dst++ = (ch >> 24);
	*dst++ = ((ch >> 16) & 0xFF);
	*dst++ = ((ch >> 8) & 0xFF);
	*dst++ = (ch & 0xFF);
#else
	*dst++ = (ch >> 8);
	*dst++ = (ch & 0xFF);
#endif
#else
#if TCL_UTF_MAX > 4
	*dst++ = (ch & 0xFF);
	*dst++ = ((ch >> 8) & 0xFF);
	*dst++ = ((ch >> 16) & 0xFF);
	*dst++ = (ch >> 24);
#else
	*dst++ = (ch & 0xFF);
	*dst++ = (ch >> 8);
#endif
#endif
    }
    *srcReadPtr = src - srcStart;
    *dstWrotePtr = dst - dstStart;
    *dstCharsPtr = numChars;
    return result;
}
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd;
    const char *dstEnd, *dstStart, *prefixBytes;
    int result, byte, numChars, charLimit = INT_MAX;
    Tcl_UniChar ch = 0;
    const unsigned short *const *toUnicode;
    const unsigned short *pageZero;
    TableEncodingData *dataPtr = clientData;

    if (flags & TCL_ENCODING_CHAR_LIMIT) {
	charLimit = *dstCharsPtr;
    }







|







2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd;
    const char *dstEnd, *dstStart, *prefixBytes;
    int result, byte, numChars, charLimit = INT_MAX;
    Tcl_UniChar ch;
    const unsigned short *const *toUnicode;
    const unsigned short *pageZero;
    TableEncodingData *dataPtr = clientData;

    if (flags & TCL_ENCODING_CHAR_LIMIT) {
	charLimit = *dstCharsPtr;
    }
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
				 * the conversion. */
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd, *srcClose;
    const char *dstStart, *dstEnd, *prefixBytes;
    Tcl_UniChar ch = 0;
    int result, len, word, numChars;
    TableEncodingData *dataPtr = clientData;
    const unsigned short *const *fromUnicode;

    result = TCL_OK;

    prefixBytes = dataPtr->prefixBytes;







|







2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
				 * the conversion. */
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd, *srcClose;
    const char *dstStart, *dstEnd, *prefixBytes;
    Tcl_UniChar ch;
    int result, len, word, numChars;
    TableEncodingData *dataPtr = clientData;
    const unsigned short *const *fromUnicode;

    result = TCL_OK;

    prefixBytes = dataPtr->prefixBytes;

Changes to generic/tclExecute.c.

5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
	    objResultPtr = Tcl_NewByteArrayObj(
		    Tcl_GetByteArrayFromObj(valuePtr, NULL)+index, 1);
	} else if (valuePtr->bytes && length == valuePtr->length) {
	    objResultPtr = Tcl_NewStringObj((const char *)
		    valuePtr->bytes+index, 1);
	} else {
	    char buf[TCL_UTF_MAX];
	    int ch = Tcl_GetUniChar(valuePtr, index);

	    /*
	     * This could be: Tcl_NewUnicodeObj((const Tcl_UniChar *)&ch, 1)
	     * but creating the object as a string seems to be faster in
	     * practical use.
	     */








|







5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
	    objResultPtr = Tcl_NewByteArrayObj(
		    Tcl_GetByteArrayFromObj(valuePtr, NULL)+index, 1);
	} else if (valuePtr->bytes && length == valuePtr->length) {
	    objResultPtr = Tcl_NewStringObj((const char *)
		    valuePtr->bytes+index, 1);
	} else {
	    char buf[TCL_UTF_MAX];
	    Tcl_UniChar ch = Tcl_GetUniChar(valuePtr, index);

	    /*
	     * This could be: Tcl_NewUnicodeObj((const Tcl_UniChar *)&ch, 1)
	     * but creating the object as a string seems to be faster in
	     * practical use.
	     */

Changes to generic/tclParse.c.

837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
				 * of bytes scanned should be written. */
    char *dst)			/* NULL, or points to buffer where the UTF-8
				 * encoding of the backslash sequence is to be
				 * written. At most TCL_UTF_MAX bytes will be
				 * written there. */
{
    register const char *p = src+1;
    Tcl_UniChar unichar = 0;
    int result;
    int count;
    char buf[TCL_UTF_MAX];

    if (numBytes == 0) {
	if (readPtr != NULL) {
	    *readPtr = 0;







|







837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
				 * of bytes scanned should be written. */
    char *dst)			/* NULL, or points to buffer where the UTF-8
				 * encoding of the backslash sequence is to be
				 * written. At most TCL_UTF_MAX bytes will be
				 * written there. */
{
    register const char *p = src+1;
    Tcl_UniChar unichar;
    int result;
    int count;
    char buf[TCL_UTF_MAX];

    if (numBytes == 0) {
	if (readPtr != NULL) {
	    *readPtr = 0;
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
	break;
    }

  done:
    if (readPtr != NULL) {
	*readPtr = count;
    }
    if ((result & 0xF800) == 0xD800) {
	/* If result is a surrogate, Tcl_UniCharToUtf will try to
	 * handle that especially, but we don't want that here.
	 */
	dst[2] = (char) ((result | 0x80) & 0xBF);
	dst[1] = (char) (((result >> 6) | 0x80) & 0xBF);
	dst[0] = (char) ((result >> 12) | 0xE0);
	return 3;
    }
    return Tcl_UniCharToUtf(result, dst);
}

/*
 *----------------------------------------------------------------------
 *
 * ParseComment --







<
<
<
<
<
<
<
<
<







987
988
989
990
991
992
993









994
995
996
997
998
999
1000
	break;
    }

  done:
    if (readPtr != NULL) {
	*readPtr = count;
    }









    return Tcl_UniCharToUtf(result, dst);
}

/*
 *----------------------------------------------------------------------
 *
 * ParseComment --

Changes to generic/tclScan.c.

68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
 */

static const char *
BuildCharSet(
    CharSet *cset,
    const char *format)		/* Points to first char of set. */
{
    Tcl_UniChar ch = 0, start;
    int offset, nranges;
    const char *end;

    memset(cset, 0, sizeof(CharSet));

    offset = Tcl_UtfToUniChar(format, &ch);
    if (ch == '^') {







|







68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
 */

static const char *
BuildCharSet(
    CharSet *cset,
    const char *format)		/* Points to first char of set. */
{
    Tcl_UniChar ch, start;
    int offset, nranges;
    const char *end;

    memset(cset, 0, sizeof(CharSet));

    offset = Tcl_UtfToUniChar(format, &ch);
    if (ch == '^') {
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
    int numVars,		/* The number of variables passed to the scan
				 * command. */
    int *totalSubs)		/* The number of variables that will be
				 * required. */
{
    int gotXpg, gotSequential, value, i, flags;
    char *end;
    Tcl_UniChar ch = 0;
    int objIndex, xpgSize, nspace = numVars;
    int *nassign = TclStackAlloc(interp, nspace * sizeof(int));
    char buf[TCL_UTF_MAX+1];
    Tcl_Obj *errorMsg;		/* Place to build an error messages. Note that
				 * these are messy operations because we do
				 * not want to use the formatting engine;
				 * we're inside there! */







|







253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
    int numVars,		/* The number of variables passed to the scan
				 * command. */
    int *totalSubs)		/* The number of variables that will be
				 * required. */
{
    int gotXpg, gotSequential, value, i, flags;
    char *end;
    Tcl_UniChar ch;
    int objIndex, xpgSize, nspace = numVars;
    int *nassign = TclStackAlloc(interp, nspace * sizeof(int));
    char buf[TCL_UTF_MAX+1];
    Tcl_Obj *errorMsg;		/* Place to build an error messages. Note that
				 * these are messy operations because we do
				 * not want to use the formatting engine;
				 * we're inside there! */
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
    int numVars, nconversions, totalVars = -1;
    int objIndex, offset, i, result, code;
    long value;
    const char *string, *end, *baseString;
    char op = 0;
    int width, underflow = 0;
    Tcl_WideInt wideValue;
    Tcl_UniChar ch = 0, sch = 0;
    Tcl_Obj **objs = NULL, *objPtr = NULL;
    int flags;
    char buf[513];		/* Temporary buffer to hold scanned number
				 * strings before they are passed to
				 * strtoul. */

    if (objc < 3) {







|







578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
    int numVars, nconversions, totalVars = -1;
    int objIndex, offset, i, result, code;
    long value;
    const char *string, *end, *baseString;
    char op = 0;
    int width, underflow = 0;
    Tcl_WideInt wideValue;
    Tcl_UniChar ch, sch;
    Tcl_Obj **objs = NULL, *objPtr = NULL;
    int flags;
    char buf[513];		/* Temporary buffer to hold scanned number
				 * strings before they are passed to
				 * strtoul. */

    if (objc < 3) {
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
	    break;
	}
	case 'c':
	    /*
	     * Scan a single Unicode character.
	     */

	    offset = Tcl_UtfToUniChar(string, &sch);
	    i = (int)sch;
	    if (!offset) {
		offset = Tcl_UtfToUniChar(string, &sch);
		i = (((i<<10) & 0x0FFC00) + 0x10000) + (sch & 0x3FF);
	    }
	    string += offset;
	    if (!(flags & SCAN_SUPPRESS)) {
		objPtr = Tcl_NewIntObj(i);
		Tcl_IncrRefCount(objPtr);
		CLANG_ASSERT(objs);
		objs[objIndex++] = objPtr;
	    }
	    break;

	case 'i':







|
<
<
<
<
<
<

|







881
882
883
884
885
886
887
888






889
890
891
892
893
894
895
896
897
	    break;
	}
	case 'c':
	    /*
	     * Scan a single Unicode character.
	     */

	    string += Tcl_UtfToUniChar(string, &sch);






	    if (!(flags & SCAN_SUPPRESS)) {
		objPtr = Tcl_NewIntObj((int)sch);
		Tcl_IncrRefCount(objPtr);
		CLANG_ASSERT(objs);
		objs[objIndex++] = objPtr;
	    }
	    break;

	case 'i':

Changes to generic/tclStringObj.c.

527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
 *
 * Side effects:
 *	Fills unichar with the index'th Unicode character.
 *
 *----------------------------------------------------------------------
 */

int
Tcl_GetUniChar(
    Tcl_Obj *objPtr,		/* The object to get the Unicode charater
				 * from. */
    int index)			/* Get the index'th Unicode character. */
{
    String *stringPtr;

    /*
     * Optimize the case where we're really dealing with a bytearray object
     * without string representation; we don't need to convert to a string to
     * perform the indexing operation.
     */

    if (TclIsPureByteArray(objPtr)) {
	unsigned char *bytes = Tcl_GetByteArrayFromObj(objPtr, NULL);

	return (int) bytes[index];
    }

    /*
     * OK, need to work with the object as a string.
     */

    SetStringFromAny(NULL, objPtr);







|
















|







527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
 *
 * Side effects:
 *	Fills unichar with the index'th Unicode character.
 *
 *----------------------------------------------------------------------
 */

Tcl_UniChar
Tcl_GetUniChar(
    Tcl_Obj *objPtr,		/* The object to get the Unicode charater
				 * from. */
    int index)			/* Get the index'th Unicode character. */
{
    String *stringPtr;

    /*
     * Optimize the case where we're really dealing with a bytearray object
     * without string representation; we don't need to convert to a string to
     * perform the indexing operation.
     */

    if (TclIsPureByteArray(objPtr)) {
	unsigned char *bytes = Tcl_GetByteArrayFromObj(objPtr, NULL);

	return (Tcl_UniChar) bytes[index];
    }

    /*
     * OK, need to work with the object as a string.
     */

    SetStringFromAny(NULL, objPtr);
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
	}
	if (stringPtr->numChars == objPtr->length) {
	    return (Tcl_UniChar) objPtr->bytes[index];
	}
	FillUnicodeRep(objPtr);
	stringPtr = GET_STRING(objPtr);
    }
    return (int) stringPtr->unicode[index];
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_GetUnicode --
 *







|







568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
	}
	if (stringPtr->numChars == objPtr->length) {
	    return (Tcl_UniChar) objPtr->bytes[index];
	}
	FillUnicodeRep(objPtr);
	stringPtr = GET_STRING(objPtr);
    }
    return stringPtr->unicode[index];
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_GetUnicode --
 *
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
Tcl_GetRange(
    Tcl_Obj *objPtr,		/* The Tcl object to find the range of. */
    int first,			/* First index of the range. */
    int last)			/* Last index of the range. */
{
    Tcl_Obj *newObjPtr;		/* The Tcl object to find the range of. */
    String *stringPtr;
    int i, firstoffset = 0, lastoffset = 0;

    /*
     * Optimize the case where we're really dealing with a bytearray object
     * without string representation; we don't need to convert to a string to
     * perform the substring operation.
     */








<







668
669
670
671
672
673
674

675
676
677
678
679
680
681
Tcl_GetRange(
    Tcl_Obj *objPtr,		/* The Tcl object to find the range of. */
    int first,			/* First index of the range. */
    int last)			/* Last index of the range. */
{
    Tcl_Obj *newObjPtr;		/* The Tcl object to find the range of. */
    String *stringPtr;


    /*
     * Optimize the case where we're really dealing with a bytearray object
     * without string representation; we don't need to convert to a string to
     * perform the substring operation.
     */

713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
	    stringPtr->numChars = newObjPtr->length;
	    return newObjPtr;
	}
	FillUnicodeRep(objPtr);
	stringPtr = GET_STRING(objPtr);
    }

    for (i = 0; i <= last + lastoffset + firstoffset; i++) {
        if ((stringPtr->unicode[i] & 0xfc00) == 0xd800) {
            if (i < first + firstoffset) {
                firstoffset++;
            } else {
                lastoffset++;
            }
        }
    }

    return Tcl_NewUnicodeObj(stringPtr->unicode + first + firstoffset, last-first+1 + lastoffset + firstoffset);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_SetStringObj --
 *







<
<
<
<
<
<
<
<
<
<
|







712
713
714
715
716
717
718










719
720
721
722
723
724
725
726
	    stringPtr->numChars = newObjPtr->length;
	    return newObjPtr;
	}
	FillUnicodeRep(objPtr);
	stringPtr = GET_STRING(objPtr);
    }











    return Tcl_NewUnicodeObj(stringPtr->unicode + first, last-first+1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_SetStringObj --
 *
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
    const char *format,
    int objc,
    Tcl_Obj *const objv[])
{
    const char *span = format, *msg, *errCode;
    int numBytes = 0, objIndex = 0, gotXpg = 0, gotSequential = 0;
    int originalLength, limit;
    Tcl_UniChar ch = 0;
    static const char *mixedXPG =
	    "cannot mix \"%\" and \"%n$\" conversion specifiers";
    static const char *const badIndex[2] = {
	"not enough arguments for all format specifiers",
	"\"%n$\" argument index out of range"
    };
    static const char *overflow = "max size for a Tcl value exceeded";







<







1729
1730
1731
1732
1733
1734
1735

1736
1737
1738
1739
1740
1741
1742
    const char *format,
    int objc,
    Tcl_Obj *const objv[])
{
    const char *span = format, *msg, *errCode;
    int numBytes = 0, objIndex = 0, gotXpg = 0, gotSequential = 0;
    int originalLength, limit;

    static const char *mixedXPG =
	    "cannot mix \"%\" and \"%n$\" conversion specifiers";
    static const char *const badIndex[2] = {
	"not enough arguments for all format specifiers",
	"\"%n$\" argument index out of range"
    };
    static const char *overflow = "max size for a Tcl value exceeded";
1765
1766
1767
1768
1769
1770
1771

1772
1773
1774
1775
1776
1777
1778

    while (*format != '\0') {
	char *end;
	int gotMinus, gotHash, gotZero, gotSpace, gotPlus, sawFlag;
	int width, gotPrecision, precision, useShort, useWide, useBig;
	int newXpg, numChars, allocSegment = 0, segmentLimit, segmentNumBytes;
	Tcl_Obj *segment;

	int step = Tcl_UtfToUniChar(format, &ch);

	format += step;
	if (ch != '%') {
	    numBytes += step;
	    continue;
	}







>







1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767

    while (*format != '\0') {
	char *end;
	int gotMinus, gotHash, gotZero, gotSpace, gotPlus, sawFlag;
	int width, gotPrecision, precision, useShort, useWide, useBig;
	int newXpg, numChars, allocSegment = 0, segmentLimit, segmentNumBytes;
	Tcl_Obj *segment;
	Tcl_UniChar ch;
	int step = Tcl_UtfToUniChar(format, &ch);

	format += step;
	if (ch != '%') {
	    numBytes += step;
	    continue;
	}
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
}

Tcl_Obj *
TclStringObjReverse(
    Tcl_Obj *objPtr)
{
    String *stringPtr;
    Tcl_UniChar ch = 0;

    if (TclIsPureByteArray(objPtr)) {
	int numBytes;
	unsigned char *from = Tcl_GetByteArrayFromObj(objPtr, &numBytes);

	if (Tcl_IsShared(objPtr)) {
	    objPtr = Tcl_NewByteArrayObj(NULL, numBytes);







|







2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
}

Tcl_Obj *
TclStringObjReverse(
    Tcl_Obj *objPtr)
{
    String *stringPtr;
    Tcl_UniChar ch;

    if (TclIsPureByteArray(objPtr)) {
	int numBytes;
	unsigned char *from = Tcl_GetByteArrayFromObj(objPtr, &numBytes);

	if (Tcl_IsShared(objPtr)) {
	    objPtr = Tcl_NewByteArrayObj(NULL, numBytes);
2772
2773
2774
2775
2776
2777
2778

2779
2780
2781
2782
2783
2784
2785
	    Tcl_UniChar *to;

	    /*
	     * Create a non-empty, pure unicode value, so we can coax
	     * Tcl_SetObjLength into growing the unicode rep buffer.
	     */


	    objPtr = Tcl_NewUnicodeObj(&ch, 1);
	    Tcl_SetObjLength(objPtr, stringPtr->numChars);
	    to = Tcl_GetUnicode(objPtr);
	    while (--src >= from) {
		*to++ = *src;
	    }
	} else {







>







2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
	    Tcl_UniChar *to;

	    /*
	     * Create a non-empty, pure unicode value, so we can coax
	     * Tcl_SetObjLength into growing the unicode rep buffer.
	     */

	    ch = 0;
	    objPtr = Tcl_NewUnicodeObj(&ch, 1);
	    Tcl_SetObjLength(objPtr, stringPtr->numChars);
	    to = Tcl_GetUnicode(objPtr);
	    while (--src >= from) {
		*to++ = *src;
	    }
	} else {
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
ExtendUnicodeRepWithString(
    Tcl_Obj *objPtr,
    const char *bytes,
    int numBytes,
    int numAppendChars)
{
    String *stringPtr = GET_STRING(objPtr);
    int incr, needed, numOrigChars = 0;
    Tcl_UniChar *dst, unichar = 0;

    if (stringPtr->hasUnicode) {
	numOrigChars = stringPtr->numChars;
    }
    if (numAppendChars == -1) {
	TclNumUtfChars(numAppendChars, bytes, numBytes);
    }







|
|







2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
ExtendUnicodeRepWithString(
    Tcl_Obj *objPtr,
    const char *bytes,
    int numBytes,
    int numAppendChars)
{
    String *stringPtr = GET_STRING(objPtr);
    int needed, numOrigChars = 0;
    Tcl_UniChar *dst;

    if (stringPtr->hasUnicode) {
	numOrigChars = stringPtr->numChars;
    }
    if (numAppendChars == -1) {
	TclNumUtfChars(numAppendChars, bytes, numBytes);
    }
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
    stringPtr->hasUnicode = 1;
    if (bytes) {
	stringPtr->numChars = needed;
    } else {
	numAppendChars = 0;
    }
    for (dst=stringPtr->unicode + numOrigChars; numAppendChars-- > 0; dst++) {
	bytes += (incr = TclUtfToUniChar(bytes, &unichar));
	*dst = unichar;
	if (!incr) {
	    bytes += TclUtfToUniChar(bytes, &unichar);
	    *++dst = unichar;
	}
    }
    *dst = 0;
}

/*
 *----------------------------------------------------------------------
 *







<
<
<
|
<
<







2887
2888
2889
2890
2891
2892
2893



2894


2895
2896
2897
2898
2899
2900
2901
    stringPtr->hasUnicode = 1;
    if (bytes) {
	stringPtr->numChars = needed;
    } else {
	numAppendChars = 0;
    }
    for (dst=stringPtr->unicode + numOrigChars; numAppendChars-- > 0; dst++) {



	bytes += TclUtfToUniChar(bytes, dst);


    }
    *dst = 0;
}

/*
 *----------------------------------------------------------------------
 *
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
    const Tcl_UniChar *unicode,
    int numChars)
{
    /*
     * Pre-condition: this is the "string" Tcl_ObjType.
     */

    int incr, i, origLength, size = 0, offset = 0;
    char *dst, buf[TCL_UTF_MAX];
    String *stringPtr = GET_STRING(objPtr);

    if (numChars < 0) {
	numChars = UnicodeLength(unicode);
    }








|







3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
    const Tcl_UniChar *unicode,
    int numChars)
{
    /*
     * Pre-condition: this is the "string" Tcl_ObjType.
     */

    int i, origLength, size = 0;	
    char *dst, buf[TCL_UTF_MAX];
    String *stringPtr = GET_STRING(objPtr);

    if (numChars < 0) {
	numChars = UnicodeLength(unicode);
    }

3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
     */

    if (numChars <= (INT_MAX - size)/TCL_UTF_MAX 
	    && stringPtr->allocated >= size + numChars * TCL_UTF_MAX) {
	goto copyBytes;
    }

    for (i = 0; i < numChars + offset && size >= 0; i++) {
	size += (incr = Tcl_UniCharToUtf((int) unicode[i], buf));
	if (!incr) offset++;
    }
    if (size < 0) {
	Tcl_Panic("max size for a Tcl value (%d bytes) exceeded", INT_MAX);
    }

    /*
     * Grow space if needed.







|
|
<







3118
3119
3120
3121
3122
3123
3124
3125
3126

3127
3128
3129
3130
3131
3132
3133
     */

    if (numChars <= (INT_MAX - size)/TCL_UTF_MAX 
	    && stringPtr->allocated >= size + numChars * TCL_UTF_MAX) {
	goto copyBytes;
    }

    for (i = 0; i < numChars && size >= 0; i++) {
	size += Tcl_UniCharToUtf((int) unicode[i], buf);

    }
    if (size < 0) {
	Tcl_Panic("max size for a Tcl value (%d bytes) exceeded", INT_MAX);
    }

    /*
     * Grow space if needed.

Changes to generic/tclUniData.c.

1552
1553
1554
1555
1556
1557
1558
1559
#define GetDelta(info) ((info) >> 8)

/*
 * This macro extracts the information about a character from the
 * Unicode character tables.
 */

#define GetUniCharInfo(ch) (groups[groupMap[pageMap[((ch) & 0x1fffff) >> OFFSET_BITS] | ((ch) & ((1 << OFFSET_BITS)-1))]])







|
1552
1553
1554
1555
1556
1557
1558
1559
#define GetDelta(info) ((info) >> 8)

/*
 * This macro extracts the information about a character from the
 * Unicode character tables.
 */

#define GetUniCharInfo(ch) (groups[groupMap[pageMap[((ch) & 0xffff) >> OFFSET_BITS] | ((ch) & ((1 << OFFSET_BITS)-1))]])

Changes to generic/tclUtf.c.

117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
    if (ch <= 0x7FF) {
	return 2;
    }
    if (ch <= 0xFFFF) {
	return 3;
    }
#if TCL_UTF_MAX > 3
#if TCL_UTF_MAX == 4
    if (ch <= 0x10FFFF) {
	return 4;
    }
#else
    if (ch <= 0x1FFFFF) {
	return 4;
    }
    if (ch <= 0x3FFFFFF) {
	return 5;
    }
    if (ch <= 0x7FFFFFFF) {
	return 6;
    }
#endif
#endif
    return 3;
}

/*
 *---------------------------------------------------------------------------
 *







<
<
<
<
<









<







117
118
119
120
121
122
123





124
125
126
127
128
129
130
131
132

133
134
135
136
137
138
139
    if (ch <= 0x7FF) {
	return 2;
    }
    if (ch <= 0xFFFF) {
	return 3;
    }
#if TCL_UTF_MAX > 3





    if (ch <= 0x1FFFFF) {
	return 4;
    }
    if (ch <= 0x3FFFFFF) {
	return 5;
    }
    if (ch <= 0x7FFFFFFF) {
	return 6;
    }

#endif
    return 3;
}

/*
 *---------------------------------------------------------------------------
 *
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
    if (ch >= 0) {
	if (ch <= 0x7FF) {
	    buf[1] = (char) ((ch | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 6) | 0xC0);
	    return 2;
	}
	if (ch <= 0xFFFF) {
#if TCL_UTF_MAX == 4
	    if ((ch & 0xF800) == 0xD800) {
		if (ch & 0x0400) {
		    /* Low surrogate */
		    buf[3] = (char) ((ch | 0x80) & 0xBF);
		    buf[2] |= (char) (((ch >> 6) | 0x80) & 0x8F);
		    return 4;
		} else {
		    /* High surrogate */
		    ch += 0x40;
		    buf[2] = (char) (((ch << 4) | 0x80) & 0xB0);
		    buf[1] = (char) (((ch >> 2) | 0x80) & 0xBF);
		    buf[0] = (char) (((ch >> 8) | 0xF0) & 0xF7);
		    return 0;
		}
	    }
#endif
	three:
	    buf[2] = (char) ((ch | 0x80) & 0xBF);
	    buf[1] = (char) (((ch >> 6) | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 12) | 0xE0);
	    return 3;
	}

#if TCL_UTF_MAX > 3
#if TCL_UTF_MAX == 4
	if (ch <= 0x10FFFF) {
	    buf[3] = (char) ((ch | 0x80) & 0xBF);
	    buf[2] = (char) (((ch >> 6) | 0x80) & 0xBF);
	    buf[1] = (char) (((ch >> 12) | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 18) | 0xF0);
	    return 4;
	}
#else
	if (ch <= 0x1FFFFF) {
	    buf[3] = (char) ((ch | 0x80) & 0xBF);
	    buf[2] = (char) (((ch >> 6) | 0x80) & 0xBF);
	    buf[1] = (char) (((ch >> 12) | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 18) | 0xF0);
	    return 4;
	}







<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<








<
<
<
<
<
<
<
<
<







168
169
170
171
172
173
174

















175
176
177
178
179
180
181
182









183
184
185
186
187
188
189
    if (ch >= 0) {
	if (ch <= 0x7FF) {
	    buf[1] = (char) ((ch | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 6) | 0xC0);
	    return 2;
	}
	if (ch <= 0xFFFF) {

















	three:
	    buf[2] = (char) ((ch | 0x80) & 0xBF);
	    buf[1] = (char) (((ch >> 6) | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 12) | 0xE0);
	    return 3;
	}

#if TCL_UTF_MAX > 3









	if (ch <= 0x1FFFFF) {
	    buf[3] = (char) ((ch | 0x80) & 0xBF);
	    buf[2] = (char) (((ch >> 6) | 0x80) & 0xBF);
	    buf[1] = (char) (((ch >> 12) | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 18) | 0xF0);
	    return 4;
	}
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
	    buf[4] = (char) (((ch >> 6) | 0x80) & 0xBF);
	    buf[3] = (char) (((ch >> 12) | 0x80) & 0xBF);
	    buf[2] = (char) (((ch >> 18) | 0x80) & 0xBF);
	    buf[1] = (char) (((ch >> 24) | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 30) | 0xFC);
	    return 6;
	}
#endif
#endif
    }

    ch = 0xFFFD;
    goto three;
}








<







200
201
202
203
204
205
206

207
208
209
210
211
212
213
	    buf[4] = (char) (((ch >> 6) | 0x80) & 0xBF);
	    buf[3] = (char) (((ch >> 12) | 0x80) & 0xBF);
	    buf[2] = (char) (((ch >> 18) | 0x80) & 0xBF);
	    buf[1] = (char) (((ch >> 24) | 0x80) & 0xBF);
	    buf[0] = (char) ((ch >> 30) | 0xFC);
	    return 6;
	}

#endif
    }

    ch = 0xFFFD;
    goto three;
}

310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
 *	Tcl_UtfCharComplete() before calling this routine to ensure that
 *	enough bytes remain in the string.
 *
 * Results:
 *	*chPtr is filled with the Tcl_UniChar, and the return value is the
 *	number of bytes from the UTF-8 string that were consumed.
 *
 *  If TCL_UTF_MAX == 4, special handling of Surrogate pairs is done:
 *
 *  If the UTF-8 string represents a character outside of the BMP, the
 *  first call to this function will fill *chPtr with the high surrogate
 *  and generate a return value of 0. Calling Tcl_UtfToUniChar again
 *  will produce the low surrogate and a return value of 4. Because *chPtr
 *  is used to remember whether the high surrogate is already produced, it
 *  is recommended to initialize the variable it points to as 0 before
 *  the first call to Tcl_UtfToUniChar is done.
 *
 * Side effects:
 *	None.
 *
 *---------------------------------------------------------------------------
 */

int







<
<
<
<
<
<
<
<
<
<







277
278
279
280
281
282
283










284
285
286
287
288
289
290
 *	Tcl_UtfCharComplete() before calling this routine to ensure that
 *	enough bytes remain in the string.
 *
 * Results:
 *	*chPtr is filled with the Tcl_UniChar, and the return value is the
 *	number of bytes from the UTF-8 string that were consumed.
 *










 * Side effects:
 *	None.
 *
 *---------------------------------------------------------------------------
 */

int
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
	/*
	 * A three-byte-character lead-byte not followed by two trail-bytes
	 * represents itself.
	 */

	*chPtr = (Tcl_UniChar) byte;
	return 1;
#if TCL_UTF_MAX == 4
    } else if (byte < 0xF8) {
	if (((src[1] & 0xC0) == 0x80) && ((src[2] & 0xC0) == 0x80) && ((src[3] & 0xC0) == 0x80)) {
	    Tcl_UniChar surrogate;
	    /*
	     * Four-byte-character lead byte followed by three trail bytes.
	     */

	    byte = (((byte & 0x07) << 18) | ((src[1] & 0x3F) << 12)
		    | ((src[2] & 0x3F) << 6) | (src[3] & 0x3F)) - 0x10000;
	    surrogate = 0xD800 + (byte >> 10);
	    if (byte & 0x100000) {
		/* out of range, < 0x10000 or > 0x10ffff */
	    } else if (*chPtr != surrogate) {
		/* produce high surrogate, but don't advance source pointer */	
		*chPtr = surrogate;
		return 0;
	    } else {
		/* produce low surrogate, and advance source pointer */	
		*chPtr = (Tcl_UniChar) (0xDC00 | (byte & 0x3FF));
		return 4;
	    }
	}

	/*
	 * A four-byte-character lead-byte not followed by three trail-bytes
	 * or representing a character < 0x10000 or > 0x10ffff represents itself.
	 */

	*chPtr = (Tcl_UniChar) byte;
	return 1;
#endif
    }
#if TCL_UTF_MAX > 4
    {
	int ch, total, trail;

	total = totalBytes[byte];
	trail = total - 1;
	if (trail > 0) {
	    ch = byte & (0x3F >> trail);







<
<
<
<
<
<
<
|
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
|







340
341
342
343
344
345
346







347

























348
349
350
351
352
353
354
355
	/*
	 * A three-byte-character lead-byte not followed by two trail-bytes
	 * represents itself.
	 */

	*chPtr = (Tcl_UniChar) byte;
	return 1;







    }

























#if TCL_UTF_MAX > 3
    {
	int ch, total, trail;

	total = totalBytes[byte];
	trail = total - 1;
	if (trail > 0) {
	    ch = byte & (0x3F >> trail);
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
    const char *src,		/* UTF-8 string to convert to Unicode. */
    int length,			/* Length of UTF-8 string in bytes, or -1 for
				 * strlen(). */
    Tcl_DString *dsPtr)		/* Unicode representation of string is
				 * appended to this previously initialized
				 * DString. */
{
    Tcl_UniChar ch, *w, *wString;
    const char *p, *end;
    int oldLength;

    if (length < 0) {
	length = strlen(src);
    }








|







396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
    const char *src,		/* UTF-8 string to convert to Unicode. */
    int length,			/* Length of UTF-8 string in bytes, or -1 for
				 * strlen(). */
    Tcl_DString *dsPtr)		/* Unicode representation of string is
				 * appended to this previously initialized
				 * DString. */
{
    Tcl_UniChar *w, *wString;
    const char *p, *end;
    int oldLength;

    if (length < 0) {
	length = strlen(src);
    }

493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
    Tcl_DStringSetLength(dsPtr,
	    (int) ((oldLength + length + 1) * sizeof(Tcl_UniChar)));
    wString = (Tcl_UniChar *) (Tcl_DStringValue(dsPtr) + oldLength);

    w = wString;
    end = src + length;
    for (p = src; p < end; ) {
	p += TclUtfToUniChar(p, &ch);
	*w++ = ch;
    }
    *w = '\0';
    Tcl_DStringSetLength(dsPtr,
	    (oldLength + ((char *) w - (char *) wString)));

    return wString;
}







|
|







418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
    Tcl_DStringSetLength(dsPtr,
	    (int) ((oldLength + length + 1) * sizeof(Tcl_UniChar)));
    wString = (Tcl_UniChar *) (Tcl_DStringValue(dsPtr) + oldLength);

    w = wString;
    end = src + length;
    for (p = src; p < end; ) {
	p += TclUtfToUniChar(p, w);
	w++;
    }
    *w = '\0';
    Tcl_DStringSetLength(dsPtr,
	    (oldLength + ((char *) w - (char *) wString)));

    return wString;
}
558
559
560
561
562
563
564
565

566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585


586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601

int
Tcl_NumUtfChars(
    register const char *src,	/* The UTF-8 string to measure. */
    int length)			/* The length of the string in bytes, or -1
				 * for strlen(string). */
{
    Tcl_UniChar ch = 0;

    register int i, n;

    /*
     * The separate implementations are faster.
     *
     * Since this is a time-sensitive function, we also do the check for the
     * single-byte char case specially.
     */

    i = 0;
    if (length < 0) {
	while (*src != '\0') {
	    n = TclUtfToUniChar(src, &ch);
	    if (!n) {
	        n = Tcl_UtfToUniChar(src, &ch);
	    }
	    src += n;
	    i++;
	}
    } else {


	while (length > 0) {
	    if (UCHAR(*src) < 0xC0) {
		length--;
		src++;
	    } else {
		n = Tcl_UtfToUniChar(src, &ch);
		if (!n) {
		    n = Tcl_UtfToUniChar(src, &ch);
		}
		length -= n;
		src += n;
	    }
	    i++;
	}
    }
    return i;







|
>
|











|
<
<
<
<



>
>





|
<
<
<







483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504




505
506
507
508
509
510
511
512
513
514
515



516
517
518
519
520
521
522

int
Tcl_NumUtfChars(
    register const char *src,	/* The UTF-8 string to measure. */
    int length)			/* The length of the string in bytes, or -1
				 * for strlen(string). */
{
    Tcl_UniChar ch;
    register Tcl_UniChar *chPtr = &ch;
    register int i;

    /*
     * The separate implementations are faster.
     *
     * Since this is a time-sensitive function, we also do the check for the
     * single-byte char case specially.
     */

    i = 0;
    if (length < 0) {
	while (*src != '\0') {
	    src += TclUtfToUniChar(src, chPtr);




	    i++;
	}
    } else {
	register int n;

	while (length > 0) {
	    if (UCHAR(*src) < 0xC0) {
		length--;
		src++;
	    } else {
		n = Tcl_UtfToUniChar(src, chPtr);



		length -= n;
		src += n;
	    }
	    i++;
	}
    }
    return i;
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636

const char *
Tcl_UtfFindFirst(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Tcl_UniChar to search for. */
{
    int len;
    Tcl_UniChar find = 0;

    while (1) {
	len = TclUtfToUniChar(src, &find);
	if (find == ch) {
	    return src;
	}
	if (*src == '\0') {







|







543
544
545
546
547
548
549
550
551
552
553
554
555
556
557

const char *
Tcl_UtfFindFirst(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Tcl_UniChar to search for. */
{
    int len;
    Tcl_UniChar find;

    while (1) {
	len = TclUtfToUniChar(src, &find);
	if (find == ch) {
	    return src;
	}
	if (*src == '\0') {
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675

const char *
Tcl_UtfFindLast(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Tcl_UniChar to search for. */
{
    int len;
    Tcl_UniChar find = 0;
    const char *last;

    last = NULL;
    while (1) {
	len = TclUtfToUniChar(src, &find);
	if (find == ch) {
	    last = src;







|







582
583
584
585
586
587
588
589
590
591
592
593
594
595
596

const char *
Tcl_UtfFindLast(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Tcl_UniChar to search for. */
{
    int len;
    Tcl_UniChar find;
    const char *last;

    last = NULL;
    while (1) {
	len = TclUtfToUniChar(src, &find);
	if (find == ch) {
	    last = src;
701
702
703
704
705
706
707
708

709
710
711
712
713
714
715
 *---------------------------------------------------------------------------
 */

const char *
Tcl_UtfNext(
    const char *src)		/* The current location in the string. */
{
    Tcl_UniChar ch = 0;

    return src + TclUtfToUniChar(src, &ch);
}

/*
 *---------------------------------------------------------------------------
 *
 * Tcl_UtfPrev --







|
>







622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
 *---------------------------------------------------------------------------
 */

const char *
Tcl_UtfNext(
    const char *src)		/* The current location in the string. */
{
    Tcl_UniChar ch;

    return src + TclUtfToUniChar(src, &ch);
}

/*
 *---------------------------------------------------------------------------
 *
 * Tcl_UtfPrev --
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
 *
 * Side effects:
 *	None.
 *
 *---------------------------------------------------------------------------
 */

int
Tcl_UniCharAtIndex(
    register const char *src,	/* The UTF-8 string to dereference. */
    register int index)		/* The position of the desired character. */
{
    Tcl_UniChar unichar = 0;
    int bytes;
    int ch = 0;

    while (index-- >= 0) {
	bytes = TclUtfToUniChar(src, &unichar);
	ch = unichar;
	if (!bytes) {
	    /* TclUtfToUniChar only returns 0 for chars > 0xffff ! */
	    bytes = TclUtfToUniChar(src, &unichar);
	    /* Combine surrogates */
	    ch = (((ch & 0x3ff) << 10) | (unichar & 0x3ff)) + 0x10000;
	}
	src += bytes;
    }
    return ch;
}

/*
 *---------------------------------------------------------------------------
 *







|




|
<
<

|
<
|
<
<
|
<
<
<
<







695
696
697
698
699
700
701
702
703
704
705
706
707


708
709

710


711




712
713
714
715
716
717
718
 *
 * Side effects:
 *	None.
 *
 *---------------------------------------------------------------------------
 */

Tcl_UniChar
Tcl_UniCharAtIndex(
    register const char *src,	/* The UTF-8 string to dereference. */
    register int index)		/* The position of the desired character. */
{
    Tcl_UniChar ch = 0;



    while (index >= 0) {

	index--;


	src += TclUtfToUniChar(src, &ch);




    }
    return ch;
}

/*
 *---------------------------------------------------------------------------
 *
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
 */

const char *
Tcl_UtfAtIndex(
    register const char *src,	/* The UTF-8 string. */
    register int index)		/* The position of the desired character. */
{
    Tcl_UniChar ch = 0;
    int len;

    while (index > 0) {
	index--;
	len = TclUtfToUniChar(src, &ch);
	if (!len) {
	    len = TclUtfToUniChar(src, &ch);
	}
	src += len;
    }
    return src;
}

/*
 *---------------------------------------------------------------------------
 *







|
<



|
<
<
<
<







731
732
733
734
735
736
737
738

739
740
741
742




743
744
745
746
747
748
749
 */

const char *
Tcl_UtfAtIndex(
    register const char *src,	/* The UTF-8 string. */
    register int index)		/* The position of the desired character. */
{
    Tcl_UniChar ch;


    while (index > 0) {
	index--;
	src += TclUtfToUniChar(src, &ch);




    }
    return src;
}

/*
 *---------------------------------------------------------------------------
 *
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
 *----------------------------------------------------------------------
 */

int
Tcl_UtfToUpper(
    char *str)			/* String to convert in place. */
{
    Tcl_UniChar ch = 0;
    int upChar;
    char *src, *dst;
    int bytes;

    /*
     * Iterate over the string until we hit the terminating null.
     */

    src = dst = str;
    while (*src) {
	bytes = TclUtfToUniChar(src, &ch);
	upChar = ch;
	if (!bytes) {
	    /* TclUtfToUniChar only returns 0 for chars > 0xffff ! */
	    bytes = TclUtfToUniChar(src, &ch);
	    /* Combine surrogates */
	    upChar = (((upChar & 0x3ff) << 10) | (ch & 0x3ff)) + 0x10000;
	}
	upChar = Tcl_UniCharToUpper(upChar);

	/*
	 * To keep badly formed Utf strings from getting inflated by the
	 * conversion (thereby causing a segfault), only copy the upper case
	 * char to dst if its size is <= the original char.
	 */








|
<










<
<
<
<
<
<
<
|







815
816
817
818
819
820
821
822

823
824
825
826
827
828
829
830
831
832







833
834
835
836
837
838
839
840
 *----------------------------------------------------------------------
 */

int
Tcl_UtfToUpper(
    char *str)			/* String to convert in place. */
{
    Tcl_UniChar ch, upChar;

    char *src, *dst;
    int bytes;

    /*
     * Iterate over the string until we hit the terminating null.
     */

    src = dst = str;
    while (*src) {
	bytes = TclUtfToUniChar(src, &ch);







	upChar = Tcl_UniCharToUpper(ch);

	/*
	 * To keep badly formed Utf strings from getting inflated by the
	 * conversion (thereby causing a segfault), only copy the upper case
	 * char to dst if its size is <= the original char.
	 */

968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
 *----------------------------------------------------------------------
 */

int
Tcl_UtfToLower(
    char *str)			/* String to convert in place. */
{
    Tcl_UniChar ch = 0;
    int lowChar;
    char *src, *dst;
    int bytes;

    /*
     * Iterate over the string until we hit the terminating null.
     */

    src = dst = str;
    while (*src) {
	bytes = TclUtfToUniChar(src, &ch);
	lowChar = ch;
	if (!bytes) {
	    /* TclUtfToUniChar only returns 0 for chars > 0xffff ! */
	    bytes = TclUtfToUniChar(src, &ch);
	    /* Combine surrogates */
	    lowChar = (((lowChar & 0x3ff) << 10) | (ch & 0x3ff)) + 0x10000;
	}
	lowChar = Tcl_UniCharToLower(lowChar);

	/*
	 * To keep badly formed Utf strings from getting inflated by the
	 * conversion (thereby causing a segfault), only copy the lower case
	 * char to dst if its size is <= the original char.
	 */








|
<










<
<
<
<
<
<
<
|







868
869
870
871
872
873
874
875

876
877
878
879
880
881
882
883
884
885







886
887
888
889
890
891
892
893
 *----------------------------------------------------------------------
 */

int
Tcl_UtfToLower(
    char *str)			/* String to convert in place. */
{
    Tcl_UniChar ch, lowChar;

    char *src, *dst;
    int bytes;

    /*
     * Iterate over the string until we hit the terminating null.
     */

    src = dst = str;
    while (*src) {
	bytes = TclUtfToUniChar(src, &ch);







	lowChar = Tcl_UniCharToLower(ch);

	/*
	 * To keep badly formed Utf strings from getting inflated by the
	 * conversion (thereby causing a segfault), only copy the lower case
	 * char to dst if its size is <= the original char.
	 */

1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
 *----------------------------------------------------------------------
 */

int
Tcl_UtfToTitle(
    char *str)			/* String to convert in place. */
{
    Tcl_UniChar ch = 0;
    int titleChar, lowChar;
    char *src, *dst;
    int bytes;

    /*
     * Capitalize the first character and then lowercase the rest of the
     * characters until we get to a null.
     */

    src = dst = str;

    if (*src) {
	bytes = TclUtfToUniChar(src, &ch);
	titleChar = ch;
	if (!bytes) {
	    /* TclUtfToUniChar only returns 0 for chars > 0xffff ! */
	    bytes = TclUtfToUniChar(src, &ch);
	    /* Combine surrogates */
	    titleChar = (((titleChar & 0x3ff) << 10) | (ch & 0x3ff)) + 0x10000;
	}
	titleChar = Tcl_UniCharToTitle(titleChar);

	if (bytes < UtfCount(titleChar)) {
	    memcpy(dst, src, (size_t) bytes);
	    dst += bytes;
	} else {
	    dst += Tcl_UniCharToUtf(titleChar, dst);
	}
	src += bytes;
    }
    while (*src) {
	bytes = TclUtfToUniChar(src, &ch);
	lowChar = ch;
	if (!bytes) {
	    /* TclUtfToUniChar only returns 0 for chars > 0xffff ! */
	    bytes = TclUtfToUniChar(src, &ch);
	    /* Combine surrogates */
	    lowChar = (((lowChar & 0x3ff) << 10) | (ch & 0x3ff)) + 0x10000;
	}
	lowChar = Tcl_UniCharToLower(lowChar);

	if (bytes < UtfCount(lowChar)) {
	    memcpy(dst, src, (size_t) bytes);
	    dst += bytes;
	} else {
	    dst += Tcl_UniCharToUtf(lowChar, dst);
	}







|
<












<
<
<
<
<
<
<
|











<
<
<
<
<
<
<
|







922
923
924
925
926
927
928
929

930
931
932
933
934
935
936
937
938
939
940
941







942
943
944
945
946
947
948
949
950
951
952
953







954
955
956
957
958
959
960
961
 *----------------------------------------------------------------------
 */

int
Tcl_UtfToTitle(
    char *str)			/* String to convert in place. */
{
    Tcl_UniChar ch, titleChar, lowChar;

    char *src, *dst;
    int bytes;

    /*
     * Capitalize the first character and then lowercase the rest of the
     * characters until we get to a null.
     */

    src = dst = str;

    if (*src) {
	bytes = TclUtfToUniChar(src, &ch);







	titleChar = Tcl_UniCharToTitle(ch);

	if (bytes < UtfCount(titleChar)) {
	    memcpy(dst, src, (size_t) bytes);
	    dst += bytes;
	} else {
	    dst += Tcl_UniCharToUtf(titleChar, dst);
	}
	src += bytes;
    }
    while (*src) {
	bytes = TclUtfToUniChar(src, &ch);







	lowChar = Tcl_UniCharToLower(ch);

	if (bytes < UtfCount(lowChar)) {
	    memcpy(dst, src, (size_t) bytes);
	    dst += bytes;
	} else {
	    dst += Tcl_UniCharToUtf(lowChar, dst);
	}
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168

int
Tcl_UtfNcmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct,		/* UTF string cs is compared to. */
    unsigned long numChars)	/* Number of UTF chars to compare. */
{
    Tcl_UniChar ch1 = 0, ch2 = 0;

    /*
     * Cannot use 'memcmp(cs, ct, n);' as byte representation of \u0000 (the
     * pair of bytes 0xc0,0x80) is larger than byte representation of \u0001
     * (the byte 0x01.)
     */








|







1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045

int
Tcl_UtfNcmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct,		/* UTF string cs is compared to. */
    unsigned long numChars)	/* Number of UTF chars to compare. */
{
    Tcl_UniChar ch1, ch2;

    /*
     * Cannot use 'memcmp(cs, ct, n);' as byte representation of \u0000 (the
     * pair of bytes 0xc0,0x80) is larger than byte representation of \u0001
     * (the byte 0x01.)
     */

1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216

int
Tcl_UtfNcasecmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct,		/* UTF string cs is compared to. */
    unsigned long numChars)	/* Number of UTF chars to compare. */
{
    Tcl_UniChar ch1 = 0, ch2 = 0;
    while (numChars-- > 0) {
	/*
	 * n must be interpreted as chars, not bytes.
	 * This should be called only when both strings are of
	 * at least n chars long (no need for \0 check)
	 */
	cs += TclUtfToUniChar(cs, &ch1);







|







1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093

int
Tcl_UtfNcasecmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct,		/* UTF string cs is compared to. */
    unsigned long numChars)	/* Number of UTF chars to compare. */
{
    Tcl_UniChar ch1, ch2;
    while (numChars-- > 0) {
	/*
	 * n must be interpreted as chars, not bytes.
	 * This should be called only when both strings are of
	 * at least n chars long (no need for \0 check)
	 */
	cs += TclUtfToUniChar(cs, &ch1);
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
 *
 * Side effects:
 *	None.
 *
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharToUpper(
    int ch)			/* Unicode character to convert. */
{
    if (!UNICODE_OUT_OF_RANGE(ch)) {
	int info = GetUniCharInfo(ch);

	if (GetCaseType(info) & 0x04) {
	    ch -= GetDelta(info);
	}
    }
    return ch & 0x1fffff;
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharToLower --
 *
 *	Compute the lowercase equivalent of the given Unicode character.
 *
 * Results:
 *	Returns the lowercase Unicode character.
 *
 * Side effects:
 *	None.
 *
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharToLower(
    int ch)			/* Unicode character to convert. */
{
    if (!UNICODE_OUT_OF_RANGE(ch)) {
	int info = GetUniCharInfo(ch);

	if (GetCaseType(info) & 0x02) {
	    ch += GetDelta(info);
	}
    }
    return ch & 0x1fffff;
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharToTitle --
 *
 *	Compute the titlecase equivalent of the given Unicode character.
 *
 * Results:
 *	Returns the titlecase Unicode character.
 *
 * Side effects:
 *	None.
 *
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharToTitle(
    int ch)			/* Unicode character to convert. */
{
    if (!UNICODE_OUT_OF_RANGE(ch)) {
	int info = GetUniCharInfo(ch);
	int mode = GetCaseType(info);

	if (mode & 0x1) {
	    /*
	     * Subtract or add one depending on the original case.
	     */

	    ch += ((mode & 0x4) ? -1 : 1);
	} else if (mode == 0x4) {
	    ch -= GetDelta(info);
	}
    }
    return ch & 0x1fffff;
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharLen --
 *







|



<
|

|
|
|
<
|


















|



<
|

|
|
|
<
|


















|



<
|
|

|
|
|
|

|
|
|
|
<
|







1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165

1166
1167
1168
1169
1170

1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193

1194
1195
1196
1197
1198

1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221

1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233

1234
1235
1236
1237
1238
1239
1240
1241
 *
 * Side effects:
 *	None.
 *
 *----------------------------------------------------------------------
 */

Tcl_UniChar
Tcl_UniCharToUpper(
    int ch)			/* Unicode character to convert. */
{

    int info = GetUniCharInfo(ch);

    if (GetCaseType(info) & 0x04) {
	ch -= GetDelta(info);
    }

    return (Tcl_UniChar) ch;
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharToLower --
 *
 *	Compute the lowercase equivalent of the given Unicode character.
 *
 * Results:
 *	Returns the lowercase Unicode character.
 *
 * Side effects:
 *	None.
 *
 *----------------------------------------------------------------------
 */

Tcl_UniChar
Tcl_UniCharToLower(
    int ch)			/* Unicode character to convert. */
{

    int info = GetUniCharInfo(ch);

    if (GetCaseType(info) & 0x02) {
	ch += GetDelta(info);
    }

    return (Tcl_UniChar) ch;
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharToTitle --
 *
 *	Compute the titlecase equivalent of the given Unicode character.
 *
 * Results:
 *	Returns the titlecase Unicode character.
 *
 * Side effects:
 *	None.
 *
 *----------------------------------------------------------------------
 */

Tcl_UniChar
Tcl_UniCharToTitle(
    int ch)			/* Unicode character to convert. */
{

    int info = GetUniCharInfo(ch);
    int mode = GetCaseType(info);

    if (mode & 0x1) {
	/*
	 * Subtract or add one depending on the original case.
	 */

	ch += ((mode & 0x4) ? -1 : 1);
    } else if (mode == 0x4) {
	ch -= GetDelta(info);
    }

    return (Tcl_UniChar) ch;
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharLen --
 *
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsAlnum(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return (((ALPHA_BITS | DIGIT_BITS) >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsAlpha --







<
<
<







1361
1362
1363
1364
1365
1366
1367



1368
1369
1370
1371
1372
1373
1374
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsAlnum(
    int ch)			/* Unicode character to test. */
{



    return (((ALPHA_BITS | DIGIT_BITS) >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsAlpha --
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsAlpha(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return ((ALPHA_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsControl --







<
<
<







1384
1385
1386
1387
1388
1389
1390



1391
1392
1393
1394
1395
1396
1397
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsAlpha(
    int ch)			/* Unicode character to test. */
{



    return ((ALPHA_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsControl --
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsControl(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	ch &= 0x1fffff;
	if ((ch == 0xe0001) || ((ch >= 0xe0020) && (ch <= 0xe007f))) {
	    return 1;
	}
	if ((ch >= 0xf0000) && ((ch & 0xffff) <= 0xfffd)) {
	    return 1;
	}
	return 0;
    }
    return ((CONTROL_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsDigit --







<
<
<
<
<
<
<
<
<
<







1407
1408
1409
1410
1411
1412
1413










1414
1415
1416
1417
1418
1419
1420
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsControl(
    int ch)			/* Unicode character to test. */
{










    return ((CONTROL_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsDigit --
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsDigit(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return (GetCategory(ch) == DECIMAL_DIGIT_NUMBER);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsGraph --







<
<
<







1430
1431
1432
1433
1434
1435
1436



1437
1438
1439
1440
1441
1442
1443
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsDigit(
    int ch)			/* Unicode character to test. */
{



    return (GetCategory(ch) == DECIMAL_DIGIT_NUMBER);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsGraph --
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsGraph(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	ch &= 0x1fffff;
	return (ch >= 0xe0100) && (ch <= 0xe01ef);
    }
    return ((GRAPH_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsLower --







<
<
<
<







1453
1454
1455
1456
1457
1458
1459




1460
1461
1462
1463
1464
1465
1466
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsGraph(
    int ch)			/* Unicode character to test. */
{




    return ((GRAPH_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsLower --
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsLower(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return (GetCategory(ch) == LOWERCASE_LETTER);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsPrint --







<
<
<







1476
1477
1478
1479
1480
1481
1482



1483
1484
1485
1486
1487
1488
1489
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsLower(
    int ch)			/* Unicode character to test. */
{



    return (GetCategory(ch) == LOWERCASE_LETTER);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsPrint --
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsPrint(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	ch &= 0x1fffff;
	return (ch >= 0xe0100) && (ch <= 0xe01ef);
    }
    return (((GRAPH_BITS|SPACE_BITS) >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsPunct --







<
<
<
<







1499
1500
1501
1502
1503
1504
1505




1506
1507
1508
1509
1510
1511
1512
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsPrint(
    int ch)			/* Unicode character to test. */
{




    return (((GRAPH_BITS|SPACE_BITS) >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsPunct --
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsPunct(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return ((PUNCT_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsSpace --







<
<
<







1522
1523
1524
1525
1526
1527
1528



1529
1530
1531
1532
1533
1534
1535
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsPunct(
    int ch)			/* Unicode character to test. */
{



    return ((PUNCT_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsSpace --
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsSpace(
    int ch)			/* Unicode character to test. */
{
	/* Ignore upper 11 bits. */
	ch &= 0x1fffff;

    /*
     * If the character is within the first 127 characters, just use the
     * standard C function, otherwise consult the Unicode table.
     */

    if (ch < 0x80) {
	return TclIsSpaceProc((char) ch);
    } else if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    } else if (ch == 0x0085 || ch == 0x180e || ch == 0x200b
	    || ch == 0x202f || ch == 0x2060 || ch == 0xfeff) {
	return 1;
    } else {
	return ((SPACE_BITS >> GetCategory(ch)) & 1);
    }
}

/*







<
<
<





|

|
<
|
|







1545
1546
1547
1548
1549
1550
1551



1552
1553
1554
1555
1556
1557
1558
1559

1560
1561
1562
1563
1564
1565
1566
1567
1568
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsSpace(
    int ch)			/* Unicode character to test. */
{



    /*
     * If the character is within the first 127 characters, just use the
     * standard C function, otherwise consult the Unicode table.
     */

    if (((Tcl_UniChar) ch) < ((Tcl_UniChar) 0x80)) {
	return TclIsSpaceProc((char) ch);
    } else if ((Tcl_UniChar) ch == 0x0085 || (Tcl_UniChar) ch == 0x180e

	    || (Tcl_UniChar) ch == 0x200b || (Tcl_UniChar) ch == 0x2060
	    || (Tcl_UniChar) ch == 0x202f || (Tcl_UniChar) ch == 0xfeff) {
	return 1;
    } else {
	return ((SPACE_BITS >> GetCategory(ch)) & 1);
    }
}

/*
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsUpper(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return (GetCategory(ch) == UPPERCASE_LETTER);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsWordChar --







<
<
<







1581
1582
1583
1584
1585
1586
1587



1588
1589
1590
1591
1592
1593
1594
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsUpper(
    int ch)			/* Unicode character to test. */
{



    return (GetCategory(ch) == UPPERCASE_LETTER);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsWordChar --
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsWordChar(
    int ch)			/* Unicode character to test. */
{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return ((WORD_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharCaseMatch --







<
<
<







1604
1605
1606
1607
1608
1609
1610



1611
1612
1613
1614
1615
1616
1617
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsWordChar(
    int ch)			/* Unicode character to test. */
{



    return ((WORD_BITS >> GetCategory(ch)) & 1);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharCaseMatch --
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
Tcl_UniCharCaseMatch(
    const Tcl_UniChar *uniStr,	/* Unicode String. */
    const Tcl_UniChar *uniPattern,
				/* Pattern, which may contain special
				 * characters. */
    int nocase)			/* 0 for case sensitive, 1 for insensitive */
{
    Tcl_UniChar ch1 = 0, p;

    while (1) {
	p = *uniPattern;

	/*
	 * See if we're at the end of both the pattern and the string. If so,
	 * we succeeded. If we're at the end of the pattern but not at the end







|







1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
Tcl_UniCharCaseMatch(
    const Tcl_UniChar *uniStr,	/* Unicode String. */
    const Tcl_UniChar *uniPattern,
				/* Pattern, which may contain special
				 * characters. */
    int nocase)			/* 0 for case sensitive, 1 for insensitive */
{
    Tcl_UniChar ch1, p;

    while (1) {
	p = *uniPattern;

	/*
	 * See if we're at the end of both the pattern and the string. If so,
	 * we succeeded. If we're at the end of the pattern but not at the end

Changes to tests/encoding.test.

328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
    list [string bytelength $x] [string bytelength $y] $z
} {1 2 c080}

test encoding-16.1 {UnicodeToUtfProc} {
    set val [encoding convertfrom unicode NN]
    list $val [format %x [scan $val %c]]
} "\u4e4e 4e4e"
test encoding-16.2 {UnicodeToUtfProc} -constraints utf16 -body {
    set val [encoding convertfrom unicode "\xd8\xd8\xdc\xdc"]
    list $val [format %x [scan $val %c]]
} -result "\U460dc 460dc"

test encoding-17.1 {UtfToUnicodeProc} -constraints utf16 -body {
    encoding convertto unicode "\U460dc"
} -result "\xd8\xd8\xdc\xdc"

test encoding-18.1 {TableToUtfProc} {
} {}

test encoding-19.1 {TableFromUtfProc} {
} {}








<
<
<
<

|
<
|







328
329
330
331
332
333
334




335
336

337
338
339
340
341
342
343
344
    list [string bytelength $x] [string bytelength $y] $z
} {1 2 c080}

test encoding-16.1 {UnicodeToUtfProc} {
    set val [encoding convertfrom unicode NN]
    list $val [format %x [scan $val %c]]
} "\u4e4e 4e4e"





test encoding-17.1 {UtfToUnicodeProc} {

} {}

test encoding-18.1 {TableToUtfProc} {
} {}

test encoding-19.1 {TableFromUtfProc} {
} {}

Changes to tests/string.test.

572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
    list [string is control -fail var \x00\x01\x10\x1F\x7F\x80\x9F\x60] $var
} {0 7}
test string-6.85 {string is control} {
    string is control \u0100
} 0
test string-6.86 {string is graph} {
    ## graph is any print char, except space
    list [string is gra -fail var "0123abc!@#\$\u0100\UE0100\UE01EF "] $var
} {0 14}
test string-6.87 {string is print} {
    ## basically any printable char
    list [string is print -fail var "0123abc!@#\$\u0100 \UE0100\UE01EF\u0010"] $var
} {0 15}
test string-6.88 {string is punct} {
    ## any graph char that isn't alnum
    list [string is punct -fail var "_!@#\u00beq0"] $var
} {0 4}
test string-6.89 {string is xdigit} {
    list [string is xdigit -fail var 0123456789\u0061bcdefABCDEFg] $var
} {0 22}







|
|


|
|







572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
    list [string is control -fail var \x00\x01\x10\x1F\x7F\x80\x9F\x60] $var
} {0 7}
test string-6.85 {string is control} {
    string is control \u0100
} 0
test string-6.86 {string is graph} {
    ## graph is any print char, except space
    list [string is gra -fail var "0123abc!@#\$\u0100 "] $var
} {0 12}
test string-6.87 {string is print} {
    ## basically any printable char
    list [string is print -fail var "0123abc!@#\$\u0100 \u0010"] $var
} {0 13}
test string-6.88 {string is punct} {
    ## any graph char that isn't alnum
    list [string is punct -fail var "_!@#\u00beq0"] $var
} {0 4}
test string-6.89 {string is xdigit} {
    list [string is xdigit -fail var 0123456789\u0061bcdefABCDEFg] $var
} {0 22}

Changes to tests/utf.test.

16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
::tcltest::loadTestedCommands
catch [list package require -exact Tcltest [info patchlevel]]

testConstraint testbytestring [llength [info commands testbytestring]]

catch {unset x}

# Some tests require support for utf16

if {[testConstraint testbytestring]} {
    testConstraint utf16 [expr {[format %c 0x010000] != [testbytestring "\xef\xbf\xbd"]}]
}

test utf-1.1 {Tcl_UniCharToUtf: 1 byte sequences} testbytestring {
    expr {"\x01" eq [testbytestring "\x01"]}
} 1
test utf-1.2 {Tcl_UniCharToUtf: 2 byte sequences} testbytestring {
    expr {"\x00" eq [testbytestring "\xc0\x80"]}
} 1
test utf-1.3 {Tcl_UniCharToUtf: 2 byte sequences} testbytestring {
    expr {"\xe0" eq [testbytestring "\xc3\xa0"]}
} 1
test utf-1.4 {Tcl_UniCharToUtf: 3 byte sequences} testbytestring {
    expr {"\u4e4e" eq [testbytestring "\xe4\xb9\x8e"]}
} 1
test utf-1.5 {Tcl_UniCharToUtf: overflowed Tcl_UniChar} testbytestring {
    expr {[format %c 0x110000] eq [testbytestring "\xef\xbf\xbd"]}
} 1
test utf-1.6 {Tcl_UniCharToUtf: negative Tcl_UniChar} testbytestring {
    expr {[format %c -1] eq [testbytestring "\xef\xbf\xbd"]}
} 1
test utf-1.7 {Tcl_UniCharToUtf: 4 byte sequences} -constraints {utf16 testbytestring} -body {
    expr {"\U014e4e" eq [testbytestring "\xf0\x94\xb9\x8e"]}
} -result 1

test utf-2.1 {Tcl_UtfToUniChar: low ascii} {
    string length "abc"
} {3}
test utf-2.2 {Tcl_UtfToUniChar: naked trail bytes} testbytestring {
    string length [testbytestring "\x82\x83\x84"]
} {3}







<
<
<
<
<
<


















<
<
<







16
17
18
19
20
21
22






23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40



41
42
43
44
45
46
47
::tcltest::loadTestedCommands
catch [list package require -exact Tcltest [info patchlevel]]

testConstraint testbytestring [llength [info commands testbytestring]]

catch {unset x}







test utf-1.1 {Tcl_UniCharToUtf: 1 byte sequences} testbytestring {
    expr {"\x01" eq [testbytestring "\x01"]}
} 1
test utf-1.2 {Tcl_UniCharToUtf: 2 byte sequences} testbytestring {
    expr {"\x00" eq [testbytestring "\xc0\x80"]}
} 1
test utf-1.3 {Tcl_UniCharToUtf: 2 byte sequences} testbytestring {
    expr {"\xe0" eq [testbytestring "\xc3\xa0"]}
} 1
test utf-1.4 {Tcl_UniCharToUtf: 3 byte sequences} testbytestring {
    expr {"\u4e4e" eq [testbytestring "\xe4\xb9\x8e"]}
} 1
test utf-1.5 {Tcl_UniCharToUtf: overflowed Tcl_UniChar} testbytestring {
    expr {[format %c 0x110000] eq [testbytestring "\xef\xbf\xbd"]}
} 1
test utf-1.6 {Tcl_UniCharToUtf: negative Tcl_UniChar} testbytestring {
    expr {[format %c -1] eq [testbytestring "\xef\xbf\xbd"]}
} 1




test utf-2.1 {Tcl_UtfToUniChar: low ascii} {
    string length "abc"
} {3}
test utf-2.2 {Tcl_UtfToUniChar: naked trail bytes} testbytestring {
    string length [testbytestring "\x82\x83\x84"]
} {3}
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
} {1}
test utf-2.6 {Tcl_UtfToUniChar: lead (3-byte) followed by 1 trail} testbytestring {
    string length [testbytestring "\xE2\xA2"]
} {2}
test utf-2.7 {Tcl_UtfToUniChar: lead (3-byte) followed by 2 trail} testbytestring {
    string length [testbytestring "\xE4\xb9\x8e"]
} {1}
test utf-2.8 {Tcl_UtfToUniChar: lead (4-byte) followed by 3 trail} -constraints {utf16 testbytestring} -body {
    string length [testbytestring "\xF0\x90\x80\x80"]
} -result {1}
test utf-2.9 {Tcl_UtfToUniChar: lead (4-byte) followed by 3 trail} -constraints {utf16 testbytestring} -body {
    string length [testbytestring "\xF4\x8F\xBF\xBF"]
} -result {1}
test utf-2.10 {Tcl_UtfToUniChar: lead (4-byte) followed by 3 trail, underflow} testbytestring {
    string length [testbytestring "\xF0\x8F\xBF\xBF"]
} {4}
test utf-2.11 {Tcl_UtfToUniChar: lead (4-byte) followed by 3 trail, overflow} testbytestring {
    string length [testbytestring "\xF4\x90\x80\x80"]
} {4}
test utf-2.12 {Tcl_UtfToUniChar: longer UTF sequences not supported} testbytestring {
    string length [testbytestring "\xF8\xA2\xA2\xA2\xA2"]
} {5}

test utf-3.1 {Tcl_UtfCharComplete} {
} {}

testConstraint testnumutfchars [llength [info commands testnumutfchars]]
test utf-4.1 {Tcl_NumUtfChars: zero length} testnumutfchars {
    testnumutfchars ""







<
<
<
<
<
<
<
<
<
<
<
<
|
|
|







56
57
58
59
60
61
62












63
64
65
66
67
68
69
70
71
72
} {1}
test utf-2.6 {Tcl_UtfToUniChar: lead (3-byte) followed by 1 trail} testbytestring {
    string length [testbytestring "\xE2\xA2"]
} {2}
test utf-2.7 {Tcl_UtfToUniChar: lead (3-byte) followed by 2 trail} testbytestring {
    string length [testbytestring "\xE4\xb9\x8e"]
} {1}












test utf-2.8 {Tcl_UtfToUniChar: longer UTF sequences not supported} testbytestring {
    string length [testbytestring "\xF4\xA2\xA2\xA2"]
} {4}

test utf-3.1 {Tcl_UtfCharComplete} {
} {}

testConstraint testnumutfchars [llength [info commands testnumutfchars]]
test utf-4.1 {Tcl_NumUtfChars: zero length} testnumutfchars {
    testnumutfchars ""
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
bsCheck \U41	65
bsCheck \Ua	10
bsCheck \UA	10
bsCheck \Ua1	161
bsCheck \U4e21	20001
bsCheck \U004e21	20001
bsCheck \U00004e21	20001
bsCheck \U0000004e21	78
if {[testConstraint utf16]} {
    bsCheck \U00110000	69632
    bsCheck \U01100000	69632
    bsCheck \U11000000	69632
    bsCheck \U0010FFFF	1114111
    bsCheck \U010FFFF0	1114111
    bsCheck \U10FFFF00	1114111
    bsCheck \UFFFFFFFF	1048575
}

test utf-11.1 {Tcl_UtfToUpper} {
    string toupper {}
} {}
test utf-11.2 {Tcl_UtfToUpper} {
    string toupper abc
} ABC







<
<
|
|
<
<
<
<
<
<







191
192
193
194
195
196
197


198
199






200
201
202
203
204
205
206
bsCheck \U41	65
bsCheck \Ua	10
bsCheck \UA	10
bsCheck \Ua1	161
bsCheck \U4e21	20001
bsCheck \U004e21	20001
bsCheck \U00004e21	20001


bsCheck \U00110000	65533
bsCheck \Uffffffff	65533







test utf-11.1 {Tcl_UtfToUpper} {
    string toupper {}
} {}
test utf-11.2 {Tcl_UtfToUpper} {
    string toupper abc
} ABC
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
    string toupper !
} !

test utf-16.1 {Tcl_UniCharToLower, negative delta} {
    string tolower aA
} aa
test utf-16.2 {Tcl_UniCharToLower, positive delta} {
    string tolower \u0178\u00ff\uA78D\u01c5\U10400
} \u00ff\u00ff\u0265\u01c6\U10428

test utf-17.1 {Tcl_UniCharToLower, no delta} {
    string tolower !
} !

test utf-18.1 {Tcl_UniCharToTitle, add one for title} {
    string totitle \u01c4







|
|







260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
    string toupper !
} !

test utf-16.1 {Tcl_UniCharToLower, negative delta} {
    string tolower aA
} aa
test utf-16.2 {Tcl_UniCharToLower, positive delta} {
    string tolower \u0178\u00ff\uA78D\u01c5
} \u00ff\u00ff\u0265\u01c6

test utf-17.1 {Tcl_UniCharToLower, no delta} {
    string tolower !
} !

test utf-18.1 {Tcl_UniCharToTitle, add one for title} {
    string totitle \u01c4

Changes to tools/uniParse.tcl.

392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
#define GetDelta(info) ((info) >> 8)

/*
 * This macro extracts the information about a character from the
 * Unicode character tables.
 */

#define GetUniCharInfo(ch) (groups\[groupMap\[pageMap\[((ch) & 0x1fffff) >> OFFSET_BITS\] | ((ch) & ((1 << OFFSET_BITS)-1))\]\])
"

    close $f
}

uni::main








|







392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
#define GetDelta(info) ((info) >> 8)

/*
 * This macro extracts the information about a character from the
 * Unicode character tables.
 */

#define GetUniCharInfo(ch) (groups\[groupMap\[pageMap\[((ch) & 0xffff) >> OFFSET_BITS\] | ((ch) & ((1 << OFFSET_BITS)-1))\]\])
"

    close $f
}

uni::main

Changes to unix/tclUnixFile.c.

150
151
152
153
154
155
156





157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
	encoding = Tcl_GetEncoding(NULL, NULL);
	Tcl_ExternalToUtfDString(encoding, name, -1, &utfName);
	TclSetObjNameOfExecutable(
		Tcl_NewStringObj(Tcl_DStringValue(&utfName), -1), encoding);
	Tcl_DStringFree(&utfName);
	goto done;
    }






    /*
     * The name is relative to the current working directory. First strip off
     * a leading "./", if any, then add the full path name of the current
     * working directory.
     */

    if ((name[0] == '.') && (name[1] == '/')) {
	name += 2;
    }

    Tcl_DStringInit(&nameString);
    Tcl_DStringAppend(&nameString, name, -1);

    TclpGetCwd(NULL, &cwd);

    Tcl_DStringFree(&buffer);
    Tcl_UtfToExternalDString(NULL, Tcl_DStringValue(&cwd),
	    Tcl_DStringLength(&cwd), &buffer);
    if (Tcl_DStringValue(&cwd)[Tcl_DStringLength(&cwd) -1] != '/') {
	TclDStringAppendLiteral(&buffer, "/");
    }
    Tcl_DStringFree(&cwd);







>
>
>
>
>














<
<







150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175


176
177
178
179
180
181
182
	encoding = Tcl_GetEncoding(NULL, NULL);
	Tcl_ExternalToUtfDString(encoding, name, -1, &utfName);
	TclSetObjNameOfExecutable(
		Tcl_NewStringObj(Tcl_DStringValue(&utfName), -1), encoding);
	Tcl_DStringFree(&utfName);
	goto done;
    }

    if (TclpGetCwd(NULL, &cwd) == NULL) {
	TclSetObjNameOfExecutable(Tcl_NewObj(), NULL);
	goto done;
    }

    /*
     * The name is relative to the current working directory. First strip off
     * a leading "./", if any, then add the full path name of the current
     * working directory.
     */

    if ((name[0] == '.') && (name[1] == '/')) {
	name += 2;
    }

    Tcl_DStringInit(&nameString);
    Tcl_DStringAppend(&nameString, name, -1);



    Tcl_DStringFree(&buffer);
    Tcl_UtfToExternalDString(NULL, Tcl_DStringValue(&cwd),
	    Tcl_DStringLength(&cwd), &buffer);
    if (Tcl_DStringValue(&cwd)[Tcl_DStringLength(&cwd) -1] != '/') {
	TclDStringAppendLiteral(&buffer, "/");
    }
    Tcl_DStringFree(&cwd);

Changes to win/configure.

3753
3754
3755
3756
3757
3758
3759







3760
3761
3762
3763
3764
3765
3766
	    # dynamic
            echo "$as_me:$LINENO: result: using shared flags" >&5
echo "${ECHO_T}using shared flags" >&6
	    runtime=-MD
	    # Add SHLIB_LD_LIBS to the Make rule, not here.
	    LIBRARIES="\${SHARED_LIBRARIES}"
	    EXESUFFIX="\${DBGX}.exe"







	fi
	MAKE_DLL="\${SHLIB_LD} \$(LDFLAGS) -out:\$@"
	# DLLSUFFIX is separate because it is the building block for
	# users of tclConfig.sh that may build shared or static.
	DLLSUFFIX="\${DBGX}.dll"
	LIBSUFFIX="\${DBGX}.lib"
	LIBFLAGSUFFIX="\${DBGX}"







>
>
>
>
>
>
>







3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
	    # dynamic
            echo "$as_me:$LINENO: result: using shared flags" >&5
echo "${ECHO_T}using shared flags" >&6
	    runtime=-MD
	    # Add SHLIB_LD_LIBS to the Make rule, not here.
	    LIBRARIES="\${SHARED_LIBRARIES}"
	    EXESUFFIX="\${DBGX}.exe"
	    case "x`echo \${VisualStudioVersion}`" in
		x14*)
		    lflags="${lflags} -nodefaultlib:libucrt.lib"
		    ;;
		*)
		    ;;
	    esac
	fi
	MAKE_DLL="\${SHLIB_LD} \$(LDFLAGS) -out:\$@"
	# DLLSUFFIX is separate because it is the building block for
	# users of tclConfig.sh that may build shared or static.
	DLLSUFFIX="\${DBGX}.dll"
	LIBSUFFIX="\${DBGX}.lib"
	LIBFLAGSUFFIX="\${DBGX}"
3793
3794
3795
3796
3797
3798
3799









3800
3801
3802
3803
3804
3805
3806
	    else
		echo "$as_me:$LINENO: result:    Using 64-bit $MACHINE mode" >&5
echo "${ECHO_T}   Using 64-bit $MACHINE mode" >&6
	    fi
	fi

	LIBS="netapi32.lib kernel32.lib user32.lib advapi32.lib ws2_32.lib"









	if test "$do64bit" != "no" ; then
	    # The space-based-path will work for the Makefile, but will
	    # not work if AC_TRY_COMPILE is called.  TEA has the
	    # TEA_PATH_NOSPACE to avoid this issue.
	    # Check if _WIN64 is already recognized, and if so we don't
	    # need to modify CC.
	    echo "$as_me:$LINENO: checking whether _WIN64 is declared" >&5







>
>
>
>
>
>
>
>
>







3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
	    else
		echo "$as_me:$LINENO: result:    Using 64-bit $MACHINE mode" >&5
echo "${ECHO_T}   Using 64-bit $MACHINE mode" >&6
	    fi
	fi

	LIBS="netapi32.lib kernel32.lib user32.lib advapi32.lib ws2_32.lib"

	case "x`echo \${VisualStudioVersion}`" in
		x14*)
		    LIBS="$LIBS ucrt.lib"
		    ;;
		*)
		    ;;
	esac

	if test "$do64bit" != "no" ; then
	    # The space-based-path will work for the Makefile, but will
	    # not work if AC_TRY_COMPILE is called.  TEA has the
	    # TEA_PATH_NOSPACE to avoid this issue.
	    # Check if _WIN64 is already recognized, and if so we don't
	    # need to modify CC.
	    echo "$as_me:$LINENO: checking whether _WIN64 is declared" >&5
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
			 -I\"${MSSDK}/Include/crt/sys\""
fi

	    RC="\"${MSSDK}/bin/rc.exe\""
	    CFLAGS_DEBUG="-nologo -Zi -Od ${runtime}d"
	    # Do not use -O2 for Win64 - this has proved buggy in code gen.
	    CFLAGS_OPTIMIZE="-nologo -O1 ${runtime}"
	    lflags="-nologo -MACHINE:${MACHINE} -LIBPATH:\"${MSSDK}/Lib/${MACHINE}\""
	    LINKBIN="\"${PATH64}/link.exe\""
	    # Avoid 'unresolved external symbol __security_cookie' errors.
	    # c.f. http://support.microsoft.com/?id=894573
	    LIBS="$LIBS bufferoverflowU.lib"
	else
	    RC="rc"
	    # -Od - no optimization
	    # -WX - warnings as errors
	    CFLAGS_DEBUG="-nologo -Z7 -Od -WX ${runtime}d"
	    # -O2 - create fast code (/Og /Oi /Ot /Oy /Ob2 /Gs /GF /Gy)
	    CFLAGS_OPTIMIZE="-nologo -O2 ${runtime}"
	    lflags="-nologo"
	    LINKBIN="link"
	fi

	if test "$doWince" != "no" ; then
	    # Set defaults for common evc4/PPC2003 setup
	    # Currently Tcl requires 300+, possibly 420+ for sockets
	    CEVERSION=420; 		# could be 211 300 301 400 420 ...







|











|







3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908
3909
			 -I\"${MSSDK}/Include/crt/sys\""
fi

	    RC="\"${MSSDK}/bin/rc.exe\""
	    CFLAGS_DEBUG="-nologo -Zi -Od ${runtime}d"
	    # Do not use -O2 for Win64 - this has proved buggy in code gen.
	    CFLAGS_OPTIMIZE="-nologo -O1 ${runtime}"
	    lflags="${lflags} -nologo -MACHINE:${MACHINE} -LIBPATH:\"${MSSDK}/Lib/${MACHINE}\""
	    LINKBIN="\"${PATH64}/link.exe\""
	    # Avoid 'unresolved external symbol __security_cookie' errors.
	    # c.f. http://support.microsoft.com/?id=894573
	    LIBS="$LIBS bufferoverflowU.lib"
	else
	    RC="rc"
	    # -Od - no optimization
	    # -WX - warnings as errors
	    CFLAGS_DEBUG="-nologo -Z7 -Od -WX ${runtime}d"
	    # -O2 - create fast code (/Og /Oi /Ot /Oy /Ob2 /Gs /GF /Gy)
	    CFLAGS_OPTIMIZE="-nologo -O2 ${runtime}"
	    lflags="${lflags} -nologo"
	    LINKBIN="link"
	fi

	if test "$doWince" != "no" ; then
	    # Set defaults for common evc4/PPC2003 setup
	    # Currently Tcl requires 300+, possibly 420+ for sockets
	    CEVERSION=420; 		# could be 211 300 301 400 420 ...

Changes to win/makefile.vc.

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#		    help files (.chm)
#
# 4)  Macros usable on the commandline:
#	INSTALLDIR=<path>
#		Sets where to install Tcl from the built binaries.
#		C:\Progra~1\Tcl is assumed when not specified.
#
#	OPTS=loimpact,msvcrt,nothreads,pdbs,profile,static,staticpkg,symbols,thrdalloc,tclalloc,unchecked,ucrt,none
#		Sets special options for the core.  The default is for none.
#		Any combination of the above may be used (comma separated).
#		'none' will over-ride everything to nothing.
#
#		loimpact = Adds a flag for how NT treats the heap to keep memory
#			   in use, low.  This is said to impact alloc performance.
#		msvcrt   = Affects the static option only to switch it from







|







67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#		    help files (.chm)
#
# 4)  Macros usable on the commandline:
#	INSTALLDIR=<path>
#		Sets where to install Tcl from the built binaries.
#		C:\Progra~1\Tcl is assumed when not specified.
#
#	OPTS=loimpact,msvcrt,nothreads,pdbs,profile,static,staticpkg,symbols,thrdalloc,tclalloc,unchecked,none
#		Sets special options for the core.  The default is for none.
#		Any combination of the above may be used (comma separated).
#		'none' will over-ride everything to nothing.
#
#		loimpact = Adds a flag for how NT treats the heap to keep memory
#			   in use, low.  This is said to impact alloc performance.
#		msvcrt   = Affects the static option only to switch it from
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#			   optimizations and creates pdb symbols files.
#		thrdalloc = Use the thread allocator (shared global free pool)
#			   This is the default on threaded builds.
#		tclalloc = Use the old non-thread allocator
#		unchecked= Allows a symbols build to not use the debug
#			   enabled runtime (msvcrt.dll not msvcrtd.dll
#			   or libcmt.lib not libcmtd.lib).
#		ucrt= Uses ucrt.lib and libvcruntime.lib, which
#			   ensures Tcl will run on machines with only the subset
#			   of the C runtime that is part of the operating system.
#			   If omitted, builds with VC 14.0 or later will require
#			   the full C runtime redistributable.
#
#	STATS=compdbg,memdbg,none
#		Sets optional memory and bytecode compiler debugging code added
#		to the core.  The default is for none.  Any combination of the
#		above may be used (comma separated).  'none' will over-ride
#		everything to nothing.
#







<
<
<
<
<







98
99
100
101
102
103
104





105
106
107
108
109
110
111
#			   optimizations and creates pdb symbols files.
#		thrdalloc = Use the thread allocator (shared global free pool)
#			   This is the default on threaded builds.
#		tclalloc = Use the old non-thread allocator
#		unchecked= Allows a symbols build to not use the debug
#			   enabled runtime (msvcrt.dll not msvcrtd.dll
#			   or libcmt.lib not libcmtd.lib).





#
#	STATS=compdbg,memdbg,none
#		Sets optional memory and bytecode compiler debugging code added
#		to the core.  The default is for none.  Any combination of the
#		above may be used (comma separated).  'none' will over-ride
#		everything to nothing.
#
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
cdebug	= -Zi -WX $(DEBUGFLAGS)
!endif

### Declarations common to all compiler options
cwarn = $(WARNINGS) -D _CRT_SECURE_NO_DEPRECATE -D _CRT_NONSTDC_NO_DEPRECATE
cflags = -nologo -c $(COMPILERFLAGS) $(cwarn) -Fp$(TMP_DIR)^\

!if $(UCRT)
!if $(DEBUG) && !$(UNCHECKED)
crt = -MDd
!else
crt = -MT
!endif
!elseif $(MSVCRT)
!if $(DEBUG) && !$(UNCHECKED)
crt = -MDd
!else
crt = -MD
!endif
!else
!if $(DEBUG) && !$(UNCHECKED)







<
<
<
<
<
<
|







484
485
486
487
488
489
490






491
492
493
494
495
496
497
498
cdebug	= -Zi -WX $(DEBUGFLAGS)
!endif

### Declarations common to all compiler options
cwarn = $(WARNINGS) -D _CRT_SECURE_NO_DEPRECATE -D _CRT_NONSTDC_NO_DEPRECATE
cflags = -nologo -c $(COMPILERFLAGS) $(cwarn) -Fp$(TMP_DIR)^\







!if $(MSVCRT)
!if $(DEBUG) && !$(UNCHECKED)
crt = -MDd
!else
crt = -MD
!endif
!else
!if $(DEBUG) && !$(UNCHECKED)
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
### Declarations common to all linker options
lflags	= -nologo -machine:$(MACHINE) $(LINKERFLAGS) $(ldebug)

!if $(PROFILE)
lflags	= $(lflags) -profile
!endif

!if $(UCRT) && !($(DEBUG) && !$(UNCHECKED))
lflags	= $(lflags) -nodefaultlib:libucrt.lib
!endif

!if $(ALIGN98_HACK) && !$(STATIC_BUILD)
### Align sections for PE size savings.
lflags	= $(lflags) -opt:nowin98
!else if !$(ALIGN98_HACK) && $(STATIC_BUILD)







|







526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
### Declarations common to all linker options
lflags	= -nologo -machine:$(MACHINE) $(LINKERFLAGS) $(ldebug)

!if $(PROFILE)
lflags	= $(lflags) -profile
!endif

!if $(MSVCRT) && !($(DEBUG) && !$(UNCHECKED)) && $(VCVERSION) >= 1900
lflags	= $(lflags) -nodefaultlib:libucrt.lib
!endif

!if $(ALIGN98_HACK) && !$(STATIC_BUILD)
### Align sections for PE size savings.
lflags	= $(lflags) -opt:nowin98
!else if !$(ALIGN98_HACK) && $(STATIC_BUILD)
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
# Avoid 'unresolved external symbol __security_cookie' errors.
# c.f. http://support.microsoft.com/?id=894573
!if "$(MACHINE)" == "IA64" || "$(MACHINE)" == "AMD64"
!if $(VCVERSION) > 1399 && $(VCVERSION) < 1500
baselibs   = $(baselibs) bufferoverflowU.lib
!endif
!endif
!if $(UCRT) && !($(DEBUG) && !$(UNCHECKED))
baselibs   = $(baselibs) ucrt.lib
!endif

#---------------------------------------------------------------------
# TclTest flags
#---------------------------------------------------------------------








|







554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
# Avoid 'unresolved external symbol __security_cookie' errors.
# c.f. http://support.microsoft.com/?id=894573
!if "$(MACHINE)" == "IA64" || "$(MACHINE)" == "AMD64"
!if $(VCVERSION) > 1399 && $(VCVERSION) < 1500
baselibs   = $(baselibs) bufferoverflowU.lib
!endif
!endif
!if $(MSVCRT) && !($(DEBUG) && !$(UNCHECKED)) && $(VCVERSION) >= 1900
baselibs   = $(baselibs) ucrt.lib
!endif

#---------------------------------------------------------------------
# TclTest flags
#---------------------------------------------------------------------

Changes to win/rules.vc.

219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
PROFILE		= 0
PGO		= 0
MSVCRT		= 1
LOIMPACT	= 0
TCL_USE_STATIC_PACKAGES	= 0
USE_THREAD_ALLOC = 1
UNCHECKED	= 0
UCRT		= 0
!else
!if [nmakehlp -f $(OPTS) "static"]
!message *** Doing static
STATIC_BUILD	= 1
!else
STATIC_BUILD	= 0
!endif







<







219
220
221
222
223
224
225

226
227
228
229
230
231
232
PROFILE		= 0
PGO		= 0
MSVCRT		= 1
LOIMPACT	= 0
TCL_USE_STATIC_PACKAGES	= 0
USE_THREAD_ALLOC = 1
UNCHECKED	= 0

!else
!if [nmakehlp -f $(OPTS) "static"]
!message *** Doing static
STATIC_BUILD	= 1
!else
STATIC_BUILD	= 0
!endif
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
!if [nmakehlp -f $(OPTS) "unchecked"]
!message *** Doing unchecked
UNCHECKED = 1
!else
UNCHECKED = 0
!endif
!endif
!if [nmakehlp -f $(OPTS) "ucrt"] && $(VCVERSION) >= 1900
!message *** Doing UCRT
UCRT = 1
!else
UCRT = 0
!endif

#----------------------------------------------------------
# Figure-out how to name our intermediate and output directories.
# We wouldn't want different builds to use the same .obj files
# by accident.
#----------------------------------------------------------








<
<
<
<
<
<







298
299
300
301
302
303
304






305
306
307
308
309
310
311
!if [nmakehlp -f $(OPTS) "unchecked"]
!message *** Doing unchecked
UNCHECKED = 1
!else
UNCHECKED = 0
!endif
!endif







#----------------------------------------------------------
# Figure-out how to name our intermediate and output directories.
# We wouldn't want different builds to use the same .obj files
# by accident.
#----------------------------------------------------------

Changes to win/tcl.m4.

787
788
789
790
791
792
793







794
795
796
797
798
799
800
	else
	    # dynamic
            AC_MSG_RESULT([using shared flags])
	    runtime=-MD
	    # Add SHLIB_LD_LIBS to the Make rule, not here.
	    LIBRARIES="\${SHARED_LIBRARIES}"
	    EXESUFFIX="\${DBGX}.exe"







	fi
	MAKE_DLL="\${SHLIB_LD} \$(LDFLAGS) -out:\[$]@"
	# DLLSUFFIX is separate because it is the building block for
	# users of tclConfig.sh that may build shared or static.
	DLLSUFFIX="\${DBGX}.dll"
	LIBSUFFIX="\${DBGX}.lib"
	LIBFLAGSUFFIX="\${DBGX}"







>
>
>
>
>
>
>







787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
	else
	    # dynamic
            AC_MSG_RESULT([using shared flags])
	    runtime=-MD
	    # Add SHLIB_LD_LIBS to the Make rule, not here.
	    LIBRARIES="\${SHARED_LIBRARIES}"
	    EXESUFFIX="\${DBGX}.exe"
	    case "x`echo \${VisualStudioVersion}`" in
		x14*)
		    lflags="${lflags} -nodefaultlib:libucrt.lib"
		    ;;
		*)
		    ;;
	    esac
	fi
	MAKE_DLL="\${SHLIB_LD} \$(LDFLAGS) -out:\[$]@"
	# DLLSUFFIX is separate because it is the building block for
	# users of tclConfig.sh that may build shared or static.
	DLLSUFFIX="\${DBGX}.dll"
	LIBSUFFIX="\${DBGX}.lib"
	LIBFLAGSUFFIX="\${DBGX}"
824
825
826
827
828
829
830









831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
		do64bit="no"
	    else
		AC_MSG_RESULT([   Using 64-bit $MACHINE mode])
	    fi
	fi

	LIBS="netapi32.lib kernel32.lib user32.lib advapi32.lib ws2_32.lib"









	if test "$do64bit" != "no" ; then
	    # The space-based-path will work for the Makefile, but will
	    # not work if AC_TRY_COMPILE is called.  TEA has the
	    # TEA_PATH_NOSPACE to avoid this issue.
	    # Check if _WIN64 is already recognized, and if so we don't
	    # need to modify CC.
	    AC_CHECK_DECL([_WIN64], [],
			  [CC="\"${PATH64}/cl.exe\" -I\"${MSSDK}/Include\" \
			 -I\"${MSSDK}/Include/crt\" \
			 -I\"${MSSDK}/Include/crt/sys\""])
	    RC="\"${MSSDK}/bin/rc.exe\""
	    CFLAGS_DEBUG="-nologo -Zi -Od ${runtime}d"
	    # Do not use -O2 for Win64 - this has proved buggy in code gen.
	    CFLAGS_OPTIMIZE="-nologo -O1 ${runtime}"
	    lflags="-nologo -MACHINE:${MACHINE} -LIBPATH:\"${MSSDK}/Lib/${MACHINE}\""
	    LINKBIN="\"${PATH64}/link.exe\""
	    # Avoid 'unresolved external symbol __security_cookie' errors.
	    # c.f. http://support.microsoft.com/?id=894573
	    LIBS="$LIBS bufferoverflowU.lib"
	else
	    RC="rc"
	    # -Od - no optimization
	    # -WX - warnings as errors
	    CFLAGS_DEBUG="-nologo -Z7 -Od -WX ${runtime}d"
	    # -O2 - create fast code (/Og /Oi /Ot /Oy /Ob2 /Gs /GF /Gy)
	    CFLAGS_OPTIMIZE="-nologo -O2 ${runtime}"
	    lflags="-nologo"
	    LINKBIN="link"
	fi

	if test "$doWince" != "no" ; then
	    # Set defaults for common evc4/PPC2003 setup
	    # Currently Tcl requires 300+, possibly 420+ for sockets
	    CEVERSION=420; 		# could be 211 300 301 400 420 ...







>
>
>
>
>
>
>
>
>














|











|







831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
		do64bit="no"
	    else
		AC_MSG_RESULT([   Using 64-bit $MACHINE mode])
	    fi
	fi

	LIBS="netapi32.lib kernel32.lib user32.lib advapi32.lib ws2_32.lib"

	case "x`echo \${VisualStudioVersion}`" in
		x14*)
		    LIBS="$LIBS ucrt.lib"
		    ;;
		*)
		    ;;
	esac

	if test "$do64bit" != "no" ; then
	    # The space-based-path will work for the Makefile, but will
	    # not work if AC_TRY_COMPILE is called.  TEA has the
	    # TEA_PATH_NOSPACE to avoid this issue.
	    # Check if _WIN64 is already recognized, and if so we don't
	    # need to modify CC.
	    AC_CHECK_DECL([_WIN64], [],
			  [CC="\"${PATH64}/cl.exe\" -I\"${MSSDK}/Include\" \
			 -I\"${MSSDK}/Include/crt\" \
			 -I\"${MSSDK}/Include/crt/sys\""])
	    RC="\"${MSSDK}/bin/rc.exe\""
	    CFLAGS_DEBUG="-nologo -Zi -Od ${runtime}d"
	    # Do not use -O2 for Win64 - this has proved buggy in code gen.
	    CFLAGS_OPTIMIZE="-nologo -O1 ${runtime}"
	    lflags="${lflags} -nologo -MACHINE:${MACHINE} -LIBPATH:\"${MSSDK}/Lib/${MACHINE}\""
	    LINKBIN="\"${PATH64}/link.exe\""
	    # Avoid 'unresolved external symbol __security_cookie' errors.
	    # c.f. http://support.microsoft.com/?id=894573
	    LIBS="$LIBS bufferoverflowU.lib"
	else
	    RC="rc"
	    # -Od - no optimization
	    # -WX - warnings as errors
	    CFLAGS_DEBUG="-nologo -Z7 -Od -WX ${runtime}d"
	    # -O2 - create fast code (/Og /Oi /Ot /Oy /Ob2 /Gs /GF /Gy)
	    CFLAGS_OPTIMIZE="-nologo -O2 ${runtime}"
	    lflags="${lflags} -nologo"
	    LINKBIN="link"
	fi

	if test "$doWince" != "no" ; then
	    # Set defaults for common evc4/PPC2003 setup
	    # Currently Tcl requires 300+, possibly 420+ for sockets
	    CEVERSION=420; 		# could be 211 300 301 400 420 ...

Changes to win/tclWinFile.c.

2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
	} else if (*wp == '/') {
	    *wp = '\\';
	}
	++wp;
    }

  done:
    
    TclDecrRefCount(validPathPtr);
    return nativePathPtr;
}

/*
 *---------------------------------------------------------------------------
 *







|







2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
	} else if (*wp == '/') {
	    *wp = '\\';
	}
	++wp;
    }

  done:

    TclDecrRefCount(validPathPtr);
    return nativePathPtr;
}

/*
 *---------------------------------------------------------------------------
 *