Tcl Source Code

Check-in [ea03de5a23]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Purge the old and broken Tcl_ObjType.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | dgp-properbytearray
Files: files | file ages | folders
SHA1: ea03de5a23af915fc67d34ab970971235bc465f6
User & Date: dgp 2016-12-06 20:09:00
Context
2016-12-07
18:57
Create a narrowing procedure to make the operation explicit when needed. check-in: ba8e57d76c user: dgp tags: dgp-properbytearray
2016-12-06
20:09
Purge the old and broken Tcl_ObjType. check-in: ea03de5a23 user: dgp tags: dgp-properbytearray
18:55
Several commands should be picky about expecting byte-valued arguments. Make them so. check-in: 1b7d737ce6 user: dgp tags: dgp-properbytearray
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to generic/tclBinary.c.

156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269

/*
 * The following object types represent an array of bytes. The intent is
 * to allow arbitrary binary data to pass through Tcl as a Tcl value
 * without loss or damage. Such values are useful for things like
 * encoded strings or Tk images to name just two.
 *
 * It's strange to have two Tcl_ObjTypes in place for this task when
 * one would do, so a bit of detail and history how we got to this point
 * and where we might go from here.
 *
 * A bytearray is an ordered sequence of bytes. Each byte is an integer
 * value in the range [0-255].  To be a Tcl value type, we need a way to
 * encode each value in the value set as a Tcl string.  The simplest
 * encoding is to represent each byte value as the same codepoint value.
 * A bytearray of N bytes is encoded into a Tcl string of N characters
 * where the codepoint of each character is the value of corresponding byte.
 * This approach creates a one-to-one map between all bytearray values
 * and a subset of Tcl string values.
 * 
 * When converting a Tcl string value to the bytearray internal rep, the
 * question arises what to do with strings outside that subset?  That is,
 * those Tcl strings containing at least one codepoint greater than 255?
 * The obviously correct answer is to raise an error!  That string value
 * does not represent any valid bytearray value. Full Stop.  The
 * setFromAnyProc signature has a completion code return value for just
 * this reason, to reject invalid inputs.
 * 
 * Unfortunately this was not the path taken by the authors of the
 * original tclByteArrayType.  They chose to accept all Tcl string values
 * as acceptable string encodings of the bytearray values that result
 * from masking away the high bits of any codepoint value at all. This
 * meant that every bytearray value had multiple accepted string
 * representations.
 *
 * The implications of this choice are truly ugly.  When a Tcl value has
 * a string representation, we are required to accept that as the true
 * value.  Bytearray values that possess a string representation cannot
 * be processed as bytearrays because we cannot know which true value
 * that bytearray represents.  The consequence is that we drag around
 * an internal rep that we cannot make any use of.  This painful price
 * is extracted at any point after a string rep happens to be generated
 * for the value.  This happens even when the troublesome codepoints
 * outside the byte range never show up.  This happens rather routinely
 * in normal Tcl operations unless we burden the script writer with the
 * cognitive burden of avoiding it.  The price is also paid by callers
 * of the C interface.  The routine
 *
 *	unsigned char *Tcl_GetByteArrayFromObj(objPtr, lenPtr)
 *
 * has a guarantee to always return a non-NULL value, but that value
 * points to a byte sequence that cannot be used by the caller to  
 * process the Tcl value absent some sideband testing that objPtr
 * is "pure".  Tcl offers no public interface to perform this test,
 * so callers either break encapsulation or are unavoidably buggy.  Tcl
 * has defined a public interface that cannot be used correctly. The
 * Tcl source code itself suffers the same problem, and has been buggy,
 * but progressively less so as more and more portions of the code have
 * been retrofitted with the required "purity testing".  The set of values
 * able to pass the purity test can be increased via the introduction of
 * a "canonical" flag marker, but the only way the broken interface itself
 * can be discarded is to start over and define the Tcl_ObjType properly.
 * Bytearrays should simply be usable as bytearrays without a kabuki
 * dance of testing.
 *
 * The Tcl_ObjType "properByteArrayType" is (nearly) a correct 
 * implementation of bytearrays.  Any Tcl value with the type
 * properByteArrayType can have its bytearray value fetched and
 * used with confidence that acting on that value is equivalent to
 * acting on the true Tcl string value.  This still implies a side
 * testing burden -- past mistakes will not let us avoid that
 * immediately, but it is at least a conventional test of type, and
 * can be implemented entirely by examining the objPtr fields, with
 * no need to query the intrep, as a canonical flag would require.
 *
 * Until Tcl_GetByteArrayFromObj() and Tcl_SetByteArrayLength() can
 * be revised to admit the possibility of returning NULL when the true
 * value is not a valid bytearray, we need a mechanism to retain
 * compatibility with the deployed callers of the broken interface.
 * That's what the retained "tclByteArrayType" provides.  In those
 * unusual circumstances where we convert an invalid bytearray value
 * to a bytearray type, it is to this legacy type.  Essentially any
 * time this legacy type gets used, it's a signal of a bug being ignored.
 * A TIP should be drafted to remove this connection to the broken past
 * so that Tcl 9 will no longer have any trace of it.  Prescribing a
 * migration path will be the key element of that work.  The internal
 * changes now in place are the limit of what can be done short of
 * interface repair.  They provide a great expansion of the histories
 * over which bytearray values can be useful in the meanwhile.
 */

static const Tcl_ObjType properByteArrayType = {
    "bytearray",
    FreeByteArrayInternalRep,
    DupByteArrayInternalRep,
    UpdateStringOfByteArray,
    NULL
};

const Tcl_ObjType tclByteArrayType = {
    "bytearray",
    FreeByteArrayInternalRep,
    DupByteArrayInternalRep,
    NULL,
    SetByteArrayFromAny
};

/*
 * The following structure is the internal rep for a ByteArray object. Keeps
 * track of how much memory has been used and how much has been allocated for
 * the byte array to enable growing and shrinking of the ByteArray object with
 * fewer mallocs.
 */








<
<
<
<









|
|
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<










<
<
<
<
<
<
<
<







156
157
158
159
160
161
162




163
164
165
166
167
168
169
170
171
172
173



































































174
175
176
177
178
179
180
181
182
183








184
185
186
187
188
189
190

/*
 * The following object types represent an array of bytes. The intent is
 * to allow arbitrary binary data to pass through Tcl as a Tcl value
 * without loss or damage. Such values are useful for things like
 * encoded strings or Tk images to name just two.
 *




 * A bytearray is an ordered sequence of bytes. Each byte is an integer
 * value in the range [0-255].  To be a Tcl value type, we need a way to
 * encode each value in the value set as a Tcl string.  The simplest
 * encoding is to represent each byte value as the same codepoint value.
 * A bytearray of N bytes is encoded into a Tcl string of N characters
 * where the codepoint of each character is the value of corresponding byte.
 * This approach creates a one-to-one map between all bytearray values
 * and a subset of Tcl string values.
 * 
 * When converting a Tcl string value to the bytearray internal rep, and
 * the string value is outside that subset, an error is raised.



































































 */

static const Tcl_ObjType properByteArrayType = {
    "bytearray",
    FreeByteArrayInternalRep,
    DupByteArrayInternalRep,
    UpdateStringOfByteArray,
    NULL
};









/*
 * The following structure is the internal rep for a ByteArray object. Keeps
 * track of how much memory has been used and how much has been allocated for
 * the byte array to enable growing and shrinking of the ByteArray object with
 * fewer mallocs.
 */

446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
Tcl_GetByteArrayFromObj(
    Tcl_Obj *objPtr,		/* The ByteArray object. */
    int *lengthPtr)		/* If non-NULL, filled with length of the
				 * array of bytes in the ByteArray object. */
{
    ByteArray *baPtr;

    if ((objPtr->typePtr != &properByteArrayType)
	    && (objPtr->typePtr != &tclByteArrayType)) {
	if (TCL_ERROR == SetByteArrayFromAny(NULL, objPtr)) {
	    if (lengthPtr != NULL) {
		*lengthPtr = 0;
	    }
	    return NULL;
	}
    }







|
<







367
368
369
370
371
372
373
374

375
376
377
378
379
380
381
Tcl_GetByteArrayFromObj(
    Tcl_Obj *objPtr,		/* The ByteArray object. */
    int *lengthPtr)		/* If non-NULL, filled with length of the
				 * array of bytes in the ByteArray object. */
{
    ByteArray *baPtr;

    if (objPtr->typePtr != &properByteArrayType) {

	if (TCL_ERROR == SetByteArrayFromAny(NULL, objPtr)) {
	    if (lengthPtr != NULL) {
		*lengthPtr = 0;
	    }
	    return NULL;
	}
    }
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
    int length)			/* New length for internal byte array. */
{
    ByteArray *byteArrayPtr;

    if (Tcl_IsShared(objPtr)) {
	Tcl_Panic("%s called with shared object", "Tcl_SetByteArrayLength");
    }
    if ((objPtr->typePtr != &properByteArrayType)
	    && (objPtr->typePtr != &tclByteArrayType)) {
	if (TCL_ERROR == SetByteArrayFromAny(NULL, objPtr)) {
	    return NULL;
	}
    }

    byteArrayPtr = GET_BYTEARRAY(objPtr);
    if (length > byteArrayPtr->allocated) {







|
<







415
416
417
418
419
420
421
422

423
424
425
426
427
428
429
    int length)			/* New length for internal byte array. */
{
    ByteArray *byteArrayPtr;

    if (Tcl_IsShared(objPtr)) {
	Tcl_Panic("%s called with shared object", "Tcl_SetByteArrayLength");
    }
    if (objPtr->typePtr != &properByteArrayType) {

	if (TCL_ERROR == SetByteArrayFromAny(NULL, objPtr)) {
	    return NULL;
	}
    }

    byteArrayPtr = GET_BYTEARRAY(objPtr);
    if (length > byteArrayPtr->allocated) {
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
    unsigned char *dst;
    ByteArray *byteArrayPtr;
    Tcl_UniChar ch;

    if (objPtr->typePtr == &properByteArrayType) {
	return TCL_OK;
    }
    if (objPtr->typePtr == &tclByteArrayType) {
	return TCL_OK;
    }

    src = TclGetStringFromObj(objPtr, &length);
    srcEnd = src + length;

    byteArrayPtr = ckalloc(BYTEARRAY_SIZE(length));
    for (dst = byteArrayPtr->bytes; src < srcEnd; ) {
	src += Tcl_UtfToUniChar(src, &ch);







<
<
<







462
463
464
465
466
467
468



469
470
471
472
473
474
475
    unsigned char *dst;
    ByteArray *byteArrayPtr;
    Tcl_UniChar ch;

    if (objPtr->typePtr == &properByteArrayType) {
	return TCL_OK;
    }




    src = TclGetStringFromObj(objPtr, &length);
    srcEnd = src + length;

    byteArrayPtr = ckalloc(BYTEARRAY_SIZE(length));
    for (dst = byteArrayPtr->bytes; src < srcEnd; ) {
	src += Tcl_UtfToUniChar(src, &ch);
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
	Tcl_Panic("%s must be called with definite number of bytes to append",
		"TclAppendBytesToByteArray");
    }
    if (len == 0) {
	/* Append zero bytes is a no-op. */
	return;
    }
    if ((objPtr->typePtr != &properByteArrayType)
	    && (objPtr->typePtr != &tclByteArrayType)) {
	if (TCL_ERROR == SetByteArrayFromAny(NULL, objPtr)) {
	    Tcl_Panic("attempt to append bytes to non-bytearray");
	}
    }
    byteArrayPtr = GET_BYTEARRAY(objPtr);

    if (len > INT_MAX - byteArrayPtr->used) {







|
<







651
652
653
654
655
656
657
658

659
660
661
662
663
664
665
	Tcl_Panic("%s must be called with definite number of bytes to append",
		"TclAppendBytesToByteArray");
    }
    if (len == 0) {
	/* Append zero bytes is a no-op. */
	return;
    }
    if (objPtr->typePtr != &properByteArrayType) {

	if (TCL_ERROR == SetByteArrayFromAny(NULL, objPtr)) {
	    Tcl_Panic("attempt to append bytes to non-bytearray");
	}
    }
    byteArrayPtr = GET_BYTEARRAY(objPtr);

    if (len > INT_MAX - byteArrayPtr->used) {

Changes to generic/tclInt.h.

2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654

/*
 * Variables denoting the Tcl object types defined in the core.
 */

MODULE_SCOPE const Tcl_ObjType tclBignumType;
MODULE_SCOPE const Tcl_ObjType tclBooleanType;
MODULE_SCOPE const Tcl_ObjType tclByteArrayType;
MODULE_SCOPE const Tcl_ObjType tclByteCodeType;
MODULE_SCOPE const Tcl_ObjType tclDoubleType;
MODULE_SCOPE const Tcl_ObjType tclEndOffsetType;
MODULE_SCOPE const Tcl_ObjType tclIntType;
MODULE_SCOPE const Tcl_ObjType tclListType;
MODULE_SCOPE const Tcl_ObjType tclDictType;
MODULE_SCOPE const Tcl_ObjType tclProcBodyType;







<







2640
2641
2642
2643
2644
2645
2646

2647
2648
2649
2650
2651
2652
2653

/*
 * Variables denoting the Tcl object types defined in the core.
 */

MODULE_SCOPE const Tcl_ObjType tclBignumType;
MODULE_SCOPE const Tcl_ObjType tclBooleanType;

MODULE_SCOPE const Tcl_ObjType tclByteCodeType;
MODULE_SCOPE const Tcl_ObjType tclDoubleType;
MODULE_SCOPE const Tcl_ObjType tclEndOffsetType;
MODULE_SCOPE const Tcl_ObjType tclIntType;
MODULE_SCOPE const Tcl_ObjType tclListType;
MODULE_SCOPE const Tcl_ObjType tclDictType;
MODULE_SCOPE const Tcl_ObjType tclProcBodyType;

Changes to generic/tclObj.c.

390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
TclInitObjSubsystem(void)
{
    Tcl_MutexLock(&tableMutex);
    typeTableInitialized = 1;
    Tcl_InitHashTable(&typeTable, TCL_STRING_KEYS);
    Tcl_MutexUnlock(&tableMutex);

    Tcl_RegisterObjType(&tclByteArrayType);
    Tcl_RegisterObjType(&tclDoubleType);
    Tcl_RegisterObjType(&tclEndOffsetType);
    Tcl_RegisterObjType(&tclIntType);
    Tcl_RegisterObjType(&tclStringType);
    Tcl_RegisterObjType(&tclListType);
    Tcl_RegisterObjType(&tclDictType);
    Tcl_RegisterObjType(&tclByteCodeType);







<







390
391
392
393
394
395
396

397
398
399
400
401
402
403
TclInitObjSubsystem(void)
{
    Tcl_MutexLock(&tableMutex);
    typeTableInitialized = 1;
    Tcl_InitHashTable(&typeTable, TCL_STRING_KEYS);
    Tcl_MutexUnlock(&tableMutex);


    Tcl_RegisterObjType(&tclDoubleType);
    Tcl_RegisterObjType(&tclEndOffsetType);
    Tcl_RegisterObjType(&tclIntType);
    Tcl_RegisterObjType(&tclStringType);
    Tcl_RegisterObjType(&tclListType);
    Tcl_RegisterObjType(&tclDictType);
    Tcl_RegisterObjType(&tclByteCodeType);

Changes to tests/execute.test.

989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
    } else {
	set result SUCCESS
    }
    set result
} SUCCESS

test execute-10.1 {TclExecuteByteCode, INST_CONCAT1, bytearrays} {
    apply {s {binary scan $s c x; list $x [scan $s$s %c%c]}} \u0130
} {48 {304 304}}
test execute-10.2 {Bug 2802881} -setup {
    interp create slave
} -body {
    # If [Bug 2802881] is not fixed, this will segfault
    slave eval {
	trace add variable ::errorInfo write {expr {$foo} ;#}







|







989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
    } else {
	set result SUCCESS
    }
    set result
} SUCCESS

test execute-10.1 {TclExecuteByteCode, INST_CONCAT1, bytearrays} {
    apply {s {binary scan [binary format a $s] c x; list $x [scan $s$s %c%c]}} \u0130
} {48 {304 304}}
test execute-10.2 {Bug 2802881} -setup {
    interp create slave
} -body {
    # If [Bug 2802881] is not fixed, this will segfault
    slave eval {
	trace add variable ::errorInfo write {expr {$foo} ;#}

Changes to tests/obj.test.

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
testConstraint longIs32bit	[expr {int(0x80000000) < 0}]
testConstraint wideBiggerThanInt [expr {wide(0x80000000) != int(0x80000000)}]

test obj-1.1 {Tcl_AppendAllObjTypes, and InitTypeTable, Tcl_RegisterObjType} testobj {
    set r 1
    foreach {t} {
	{array search}
	bytearray
	bytecode
	cmdName
	dict
	end-offset
	regexp
	string
    } {
        set first [string first $t [testobj types]]
        set r [expr {$r && ($first != -1)}]
    }
    set result $r
} {1}

test obj-2.1 {Tcl_GetObjType error} testobj {
    list [testintobj set 1 0] [catch {testobj convert 1 foo} msg] $msg
} {0 1 {no type foo found}}
test obj-2.2 {Tcl_GetObjType and Tcl_ConvertToType} testobj {
    set result ""
    lappend result [testobj freeallvars]
    lappend result [testintobj set 1 12]
    lappend result [testobj convert 1 bytearray]
    lappend result [testobj type 1]
    lappend result [testobj refcount 1]
} {{} 12 12 bytearray 3}

test obj-3.1 {Tcl_ConvertToType error} testobj {
    list [testdoubleobj set 1 12.34] \
	[catch {testobj convert 1 end-offset} msg] \
	 $msg
} {12.34 1 {bad index "12.34": must be end?[+-]integer?}}
test obj-3.2 {Tcl_ConvertToType error, "empty string" object} testobj {







<




















|


|







23
24
25
26
27
28
29

30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
testConstraint longIs32bit	[expr {int(0x80000000) < 0}]
testConstraint wideBiggerThanInt [expr {wide(0x80000000) != int(0x80000000)}]

test obj-1.1 {Tcl_AppendAllObjTypes, and InitTypeTable, Tcl_RegisterObjType} testobj {
    set r 1
    foreach {t} {
	{array search}

	bytecode
	cmdName
	dict
	end-offset
	regexp
	string
    } {
        set first [string first $t [testobj types]]
        set r [expr {$r && ($first != -1)}]
    }
    set result $r
} {1}

test obj-2.1 {Tcl_GetObjType error} testobj {
    list [testintobj set 1 0] [catch {testobj convert 1 foo} msg] $msg
} {0 1 {no type foo found}}
test obj-2.2 {Tcl_GetObjType and Tcl_ConvertToType} testobj {
    set result ""
    lappend result [testobj freeallvars]
    lappend result [testintobj set 1 12]
    lappend result [testobj convert 1 string]
    lappend result [testobj type 1]
    lappend result [testobj refcount 1]
} {{} 12 12 string 3}

test obj-3.1 {Tcl_ConvertToType error} testobj {
    list [testdoubleobj set 1 12.34] \
	[catch {testobj convert 1 end-offset} msg] \
	 $msg
} {12.34 1 {bad index "12.34": must be end?[+-]integer?}}
test obj-3.2 {Tcl_ConvertToType error, "empty string" object} testobj {