Tcl Source Code

Check-in [237dd7902f]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:merge trunk
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | novem
Files: files | file ages | folders
SHA1: 237dd7902f7fe29794ea4360d189d15997d57a73
User & Date: dgp 2016-12-02 19:02:41
Context
2016-12-06
12:46
merge trunk check-in: 54d0f67188 user: dgp tags: novem
2016-12-05
19:17
WIP trial of proper bytearrays in Tcl 9. check-in: d7c68a46af user: dgp tags: dgp-properbytearray
2016-12-02
19:34
merge novem check-in: da8349c19a user: dgp tags: dgp-refactor
19:02
merge trunk check-in: 237dd7902f user: dgp tags: novem
18:18
Added long comment explaining history and work in progress making bytearray interfaces usable. check-in: d42a114238 user: dgp tags: trunk
14:53
merge-mark check-in: c54751a9f2 user: jan.nijtmans tags: novem
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to generic/tclBasic.c.

2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
    Tcl_Command cmd)		/* Token for command to delete. */
{
    Interp *iPtr = (Interp *) interp;
    Command *cmdPtr = (Command *) cmd;
    ImportRef *refPtr, *nextRefPtr;
    Tcl_Command importCmd;

    /*
     * Bump the command epoch counter. This will invalidate all cached
     * references that point to this command.
     */

    cmdPtr->cmdEpoch++;

    /*
     * The code here is tricky. We can't delete the hash table entry before
     * invoking the deletion callback because there are cases where the
     * deletion callback needs to invoke the command (e.g. object systems such
     * as OTcl). However, this means that the callback could try to delete or
     * rename the command. The deleted flag allows us to detect these cases
     * and skip nested deletes.







<
<
<
<
<
<
<







2964
2965
2966
2967
2968
2969
2970







2971
2972
2973
2974
2975
2976
2977
    Tcl_Command cmd)		/* Token for command to delete. */
{
    Interp *iPtr = (Interp *) interp;
    Command *cmdPtr = (Command *) cmd;
    ImportRef *refPtr, *nextRefPtr;
    Tcl_Command importCmd;








    /*
     * The code here is tricky. We can't delete the hash table entry before
     * invoking the deletion callback because there are cases where the
     * deletion callback needs to invoke the command (e.g. object systems such
     * as OTcl). However, this means that the callback could try to delete or
     * rename the command. The deleted flag allows us to detect these cases
     * and skip nested deletes.
2993
2994
2995
2996
2997
2998
2999








3000
3001
3002
3003
3004
3005
3006
	 * three times, everything goes up in smoke. [Bug 1220058]
	 */

	if (cmdPtr->hPtr != NULL) {
	    Tcl_DeleteHashEntry(cmdPtr->hPtr);
	    cmdPtr->hPtr = NULL;
	}








	return 0;
    }

    /*
     * We must delete this command, even though both traces and delete procs
     * may try to avoid this (renaming the command etc). Also traces and
     * delete procs may try to delete the command themsevles. This flag







>
>
>
>
>
>
>
>







2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
	 * three times, everything goes up in smoke. [Bug 1220058]
	 */

	if (cmdPtr->hPtr != NULL) {
	    Tcl_DeleteHashEntry(cmdPtr->hPtr);
	    cmdPtr->hPtr = NULL;
	}

	/*
	 * Bump the command epoch counter. This will invalidate all cached
	 * references that point to this command.
	 */

	cmdPtr->cmdEpoch++;

	return 0;
    }

    /*
     * We must delete this command, even though both traces and delete procs
     * may try to avoid this (renaming the command etc). Also traces and
     * delete procs may try to delete the command themsevles. This flag
3095
3096
3097
3098
3099
3100
3101







3102
3103
3104
3105
3106
3107
3108
     * cmdPtr->hptr, and make sure that no-one else has already deleted the
     * hash entry.
     */

    if (cmdPtr->hPtr != NULL) {
	Tcl_DeleteHashEntry(cmdPtr->hPtr);
	cmdPtr->hPtr = NULL;







    }

    /*
     * A number of tests for particular kinds of commands are done by checking
     * whether the objProc field holds a known value. Set the field to NULL so
     * that such tests won't have false positives when applied to deleted
     * commands.







>
>
>
>
>
>
>







3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
     * cmdPtr->hptr, and make sure that no-one else has already deleted the
     * hash entry.
     */

    if (cmdPtr->hPtr != NULL) {
	Tcl_DeleteHashEntry(cmdPtr->hPtr);
	cmdPtr->hPtr = NULL;

	/*
	 * Bump the command epoch counter. This will invalidate all cached
	 * references that point to this command.
	 */

	cmdPtr->cmdEpoch++;
    }

    /*
     * A number of tests for particular kinds of commands are done by checking
     * whether the objProc field holds a known value. Set the field to NULL so
     * that such tests won't have false positives when applied to deleted
     * commands.

Changes to generic/tclBinary.c.

151
152
153
154
155
156
157
158


159





160



161
162
163
164
165

166
167
168

























169
170
171

172







173







174
175
176
177







178



179









180
181
182
183
184
185
186
    { "hex",      BinaryDecodeHex, TclCompileBasic1Or2ArgCmd, NULL, NULL, 0 },
    { "uuencode", BinaryDecodeUu,  TclCompileBasic1Or2ArgCmd, NULL, NULL, 0 },
    { "base64",   BinaryDecode64,  TclCompileBasic1Or2ArgCmd, NULL, NULL, 0 },
    { NULL, NULL, NULL, NULL, NULL, 0 }
};

/*
 * The following object type represents an array of bytes. An array of bytes


 * is not equivalent to an internationalized string. Conceptually, a string is





 * an array of 16-bit quantities organized as a sequence of properly formed



 * UTF-8 characters, while a ByteArray is an array of 8-bit quantities.
 * Accessor functions are provided to convert a ByteArray to a String or a
 * String to a ByteArray. Two or more consecutive bytes in an array of bytes
 * may look like a single UTF-8 character if the array is casually treated as
 * a string. But obtaining the String from a ByteArray is guaranteed to

 * produced properly formed UTF-8 sequences so that there is a one-to-one map
 * between bytes and characters.
 *

























 * Converting a ByteArray to a String proceeds by casting each byte in the
 * array to a 16-bit quantity, treating that number as a Unicode character,
 * and storing the UTF-8 version of that Unicode character in the String. For

 * ByteArrays consisting entirely of values 1..127, the corresponding String







 * representation is the same as the ByteArray representation.







 *
 * Converting a String to a ByteArray proceeds by getting the Unicode
 * representation of each character in the String, casting it to a byte by
 * truncating the upper 8 bits, and then storing the byte in the ByteArray.







 * Converting from ByteArray to String and back to ByteArray is not lossy, but



 * converting an arbitrary String to a ByteArray may be.









 */

static const Tcl_ObjType properByteArrayType = {
    "bytearray",
    FreeByteArrayInternalRep,
    DupByteArrayInternalRep,
    UpdateStringOfByteArray,







|
>
>
|
>
>
>
>
>
|
>
>
>
|
<
<
<
<
>
|
|
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
<
|
>
|
>
>
>
>
>
>
>
|
>
>
>
>
>
>
>

|
|
|
>
>
>
>
>
>
>
|
>
>
>
|
>
>
>
>
>
>
>
>
>







151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171




172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201

202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
    { "hex",      BinaryDecodeHex, TclCompileBasic1Or2ArgCmd, NULL, NULL, 0 },
    { "uuencode", BinaryDecodeUu,  TclCompileBasic1Or2ArgCmd, NULL, NULL, 0 },
    { "base64",   BinaryDecode64,  TclCompileBasic1Or2ArgCmd, NULL, NULL, 0 },
    { NULL, NULL, NULL, NULL, NULL, 0 }
};

/*
 * The following object types represent an array of bytes. The intent is
 * to allow arbitrary binary data to pass through Tcl as a Tcl value
 * without loss or damage. Such values are useful for things like
 * encoded strings or Tk images to name just two.
 *
 * It's strange to have two Tcl_ObjTypes in place for this task when
 * one would do, so a bit of detail and history how we got to this point
 * and where we might go from here.
 *
 * A bytearray is an ordered sequence of bytes. Each byte is an integer
 * value in the range [0-255].  To be a Tcl value type, we need a way to
 * encode each value in the value set as a Tcl string.  The simplest
 * encoding is to represent each byte value as the same codepoint value.
 * A bytearray of N bytes is encoded into a Tcl string of N characters




 * where the codepoint of each character is the value of corresponding byte.
 * This approach creates a one-to-one map between all bytearray values
 * and a subset of Tcl string values.
 * 
 * When converting a Tcl string value to the bytearray internal rep, the
 * question arises what to do with strings outside that subset?  That is,
 * those Tcl strings containing at least one codepoint greater than 255?
 * The obviously correct answer is to raise an error!  That string value
 * does not represent any valid bytearray value. Full Stop.  The
 * setFromAnyProc signature has a completion code return value for just
 * this reason, to reject invalid inputs.
 * 
 * Unfortunately this was not the path taken by the authors of the
 * original tclByteArrayType.  They chose to accept all Tcl string values
 * as acceptable string encodings of the bytearray values that result
 * from masking away the high bits of any codepoint value at all. This
 * meant that every bytearray value had multiple accepted string
 * representations.
 *
 * The implications of this choice are truly ugly.  When a Tcl value has
 * a string representation, we are required to accept that as the true
 * value.  Bytearray values that possess a string representation cannot
 * be processed as bytearrays because we cannot know which true value
 * that bytearray represents.  The consequence is that we drag around
 * an internal rep that we cannot make any use of.  This painful price
 * is extracted at any point after a string rep happens to be generated
 * for the value.  This happens even when the troublesome codepoints
 * outside the byte range never show up.  This happens rather routinely
 * in normal Tcl operations unless we burden the script writer with the
 * cognitive burden of avoiding it.  The price is also paid by callers

 * of the C interface.  The routine
 *
 *	unsigned char *Tcl_GetByteArrayFromObj(objPtr, lenPtr)
 *
 * has a guarantee to always return a non-NULL value, but that value
 * points to a byte sequence that cannot be used by the caller to  
 * process the Tcl value absent some sideband testing that objPtr
 * is "pure".  Tcl offers no public interface to perform this test,
 * so callers either break encapsulation or are unavoidably buggy.  Tcl
 * has defined a public interface that cannot be used correctly. The
 * Tcl source code itself suffers the same problem, and has been buggy,
 * but progressively less so as more and more portions of the code have
 * been retrofitted with the required "purity testing".  The set of values
 * able to pass the purity test can be increased via the introduction of
 * a "canonical" flag marker, but the only way the broken interface itself
 * can be discarded is to start over and define the Tcl_ObjType properly.
 * Bytearrays should simply be usable as bytearrays without a kabuki
 * dance of testing.
 *
 * The Tcl_ObjType "properByteArrayType" is (nearly) a correct 
 * implementation of bytearrays.  Any Tcl value with the type
 * properByteArrayType can have its bytearray value fetched and
 * used with confidence that acting on that value is equivalent to
 * acting on the true Tcl string value.  This still implies a side
 * testing burden -- past mistakes will not let us avoid that
 * immediately, but it is at least a conventional test of type, and
 * can be implemented entirely by examining the objPtr fields, with
 * no need to query the intrep, as a canonical flag would require.
 *
 * Until Tcl_GetByteArrayFromObj() and Tcl_SetByteArrayLength() can
 * be revised to admit the possibility of returning NULL when the true
 * value is not a valid bytearray, we need a mechanism to retain
 * compatibility with the deployed callers of the broken interface.
 * That's what the retained "tclByteArrayType" provides.  In those
 * unusual circumstances where we convert an invalid bytearray value
 * to a bytearray type, it is to this legacy type.  Essentially any
 * time this legacy type gets used, it's a signal of a bug being ignored.
 * A TIP should be drafted to remove this connection to the broken past
 * so that Tcl 9 will no longer have any trace of it.  Prescribing a
 * migration path will be the key element of that work.  The internal
 * changes now in place are the limit of what can be done short of
 * interface repair.  They provide a great expansion of the histories
 * over which bytearray values can be useful in the meanwhile.
 */

static const Tcl_ObjType properByteArrayType = {
    "bytearray",
    FreeByteArrayInternalRep,
    DupByteArrayInternalRep,
    UpdateStringOfByteArray,