Ticket UUID: | 436332 | |||
Title: | Need way to append bytearray objs | |||
Type: | RFE | Version: | None | |
Submitter: | davidw | Created on: | 2001-06-26 10:41:04 | |
Subsystem: | 12. ByteArray Object | Assigned To: | jan.nijtmans | |
Priority: | 7 High | Severity: | Minor | |
Status: | Closed | Last Modified: | 2014-03-18 11:52:05 | |
Resolution: | Duplicate | Closed By: | jan.nijtmans | |
Closed on: | 2014-03-18 11:52:05 | |||
Description: |
There needs to be a way to concat/append ByteArray objects with *no* twiddling of the data (binary clean). Tcl_AppendToObj and company don't work, as they are meant for strings, and seem to do weird UTF stuff. I can implement this if it will be included. | |||
User Comments: |
jan.nijtmans added on 2014-03-18 11:52:05:
(text/x-fossil-wiki)
Dup of [2992970], fixed already. dkf added on 2003-09-22 19:23:23: Logged In: YES user_id=79902 Sure, but the problem is working out when that is what is actually meant; people do some very odd things with Tcl code... :^/ The model (transforming everything into UTF-8) works, even if it is not efficient, and we don't want to convert to bytes too eagerly since that drops information. If we stick to optimizing what we know is safe, at least it won't jump up and bite us later. jgullingsrud added on 2003-09-22 16:44:07: Logged In: YES user_id=19365 That sounds safe, and would meet the needs of my code. From a "principle of least surprise" standpoint, though, wouldn't you rather have strings stay strings and bytes stay bytes? Byte arrays in Tcl never happen by accident; you either have to fconfigure a channel to be binary or use the binary command. If the programmer has gone to the trouble to get a byte array, it seems to me that Tcl should try to respect that; one way to do that would be for append to never convert its first argument from binary to string. dkf added on 2003-09-22 16:22:56: Logged In: YES user_id=79902 The case that can be reasonably done directly with bytes instead of (UTF8/UNICODE) characters is when all objects are byte-arrays (or if one of them is an empty object.) jgullingsrud added on 2003-09-22 12:06:43: Logged In: YES user_id=19365 Here's a patch to tclVar.c (the TclPtrSetVar routine) that makes the append command respect ByteArray types. It checks whether the object being appended to is of ByteArray type and, if so, retains this type and converts objects to be appended to ByteArray before appending their bytes. Diff is from Tcl 8.4.4. --- tclVar.c.orig 2003-09-21 18:57:43.000000000 -0700 +++ tclVar.c 2003-09-21 18:58:42.000000000 -0700 @@ -1659,8 +1659,25 @@ oldValuePtr = varPtr->value.objPtr; Tcl_IncrRefCount(oldValuePtr); /* since var is ref */ } - Tcl_AppendObjToObj(oldValuePtr, newValuePtr); - } + /* + * If oldValuePtr is a ByteArray, append as bytes + */ + if (oldValuePtr->typePtr == &tclByteArrayType) { + unsigned char *oldbytes, *newbytes, *oldandnewbytes; + int oldlength=-1, newlength=-1; + oldbytes = Tcl_GetByteArrayFromObj(oldValuePtr, &oldlength); + newbytes = Tcl_GetByteArrayFromObj(newValuePtr, &newlength); + /* + * Ok to to call SetByteArrayLength because we've already checked + * if oldValuePtr is shared. + */ + oldandnewbytes = Tcl_SetByteArrayLength(oldValuePtr, + oldlength+newlength); + memcpy(oldandnewbytes+(size_t)oldlength, newbytes, (size_t)newlength); + } else { + Tcl_AppendObjToObj(oldValuePtr, newValuePtr); + } + } } } else if (newValuePtr != oldValuePtr) { /* davidw added on 2001-07-06 18:30:09: Logged In: YES user_id=240 Well, Tcl_AppendObjToObj actually seems to work ok. I am able to manipulate a jpeg with it with no corruption. It translates its bytes into UTF, whereas Tcl_AppendToObj does not. My quick fix is the following: --- tclStringObj.c2001/05/15 21:30:461.21 +++ tclStringObj.c2001/07/06 11:22:56 @@ -979,7 +979,8 @@ stringPtr = GET_STRING(objPtr); } else { - AppendUtfToUtfRep(objPtr, bytes, length); + Tcl_Obj *appendObj = Tcl_NewByteArrayObj(bytes, length); + Tcl_AppendObjToObj(objPtr, appendObj); } } Maybe there is a more efficient way of transforming the bytes to UTF? Is it worthwhile making a special case in order to speed up this operation? (i.e., check to see if objptr is a bytearray here, and also in objtoobj). davidw added on 2001-07-05 20:57:04: Logged In: YES user_id=240 I can implement this, but I think someone from the core might do well to look at the file as a whole. It only talks about strings in the comments at the top... Maybe a bit of reorganization is necessary. nijtmans added on 2001-07-05 19:27:40: Logged In: YES user_id=61031 This should be easy to implement: Just check in the function Tcl_AppendToObj if both objects are ByteArrays, and if so just create a new ByteArray for the result. I'll approve such kind of implementation, and I appreciate a patch submission. dkf added on 2001-06-28 15:25:33: Logged In: YES user_id=79902 It doesn't make much sense to [concat] two bytearray objects as that is a list operation (well, sort of; [concat]s horrid in its actual definition.) But [append] definitely should operate efficiently with bytearrays. |