Tcl Source Code

View Ticket
Login
Ticket UUID: 436332
Title: Need way to append bytearray objs
Type: RFE Version: None
Submitter: davidw Created on: 2001-06-26 10:41:04
Subsystem: 12. ByteArray Object Assigned To: jan.nijtmans
Priority: 7 High Severity: Minor
Status: Closed Last Modified: 2014-03-18 11:52:05
Resolution: Duplicate Closed By: jan.nijtmans
    Closed on: 2014-03-18 11:52:05
Description:
There needs to be a way to concat/append ByteArray
objects with *no* twiddling of the data (binary clean).
 Tcl_AppendToObj and company don't work, as they are
meant for strings, and seem to do weird UTF stuff.  I
can implement this if it will be included.
User Comments: jan.nijtmans added on 2014-03-18 11:52:05:

Dup of [2992970], fixed already.


dkf added on 2003-09-22 19:23:23:
Logged In: YES 
user_id=79902

Sure, but the problem is working out when that is what is 
actually meant; people do some very odd things with Tcl 
code...  :^/

The model (transforming everything into UTF-8) works, even 
if it is not efficient, and we don't want to convert to bytes 
too eagerly since that drops information.  If we stick to 
optimizing what we know is safe, at least it won't jump up 
and bite us later.

jgullingsrud added on 2003-09-22 16:44:07:
Logged In: YES 
user_id=19365

That sounds safe, and would meet the needs of my code.  From
a "principle of least surprise" standpoint, though, wouldn't
you rather have strings stay strings and bytes stay bytes?  

Byte arrays in Tcl never happen by accident; you either have
to fconfigure a channel to be binary or use the binary
command.  If the programmer has gone to the trouble to get a
byte array, it seems to me that Tcl should try to respect
that; one way to do that would be for append to never
convert its first argument from binary to string.

dkf added on 2003-09-22 16:22:56:
Logged In: YES 
user_id=79902

The case that can be reasonably done directly with bytes 
instead of (UTF8/UNICODE) characters is when all objects are 
byte-arrays (or if one of them is an empty object.)

jgullingsrud added on 2003-09-22 12:06:43:
Logged In: YES 
user_id=19365

Here's a patch to tclVar.c (the TclPtrSetVar routine) that
makes the append command respect ByteArray types.  It checks
whether the object being appended to is of ByteArray type
and, if so, retains this type and converts objects to be
appended to ByteArray before appending their bytes.  Diff is
from Tcl 8.4.4.

--- tclVar.c.orig 2003-09-21 18:57:43.000000000 -0700
+++ tclVar.c  2003-09-21 18:58:42.000000000 -0700
@@ -1659,8 +1659,25 @@
        oldValuePtr = varPtr->value.objPtr;
        Tcl_IncrRefCount(oldValuePtr); /* since var is ref */
    }
-   Tcl_AppendObjToObj(oldValuePtr, newValuePtr);
-     }
+    /*
+     * If oldValuePtr is a ByteArray, append as bytes
+     */
+    if (oldValuePtr->typePtr == &tclByteArrayType) {
+      unsigned char *oldbytes, *newbytes, *oldandnewbytes;
+      int oldlength=-1, newlength=-1;
+      oldbytes = Tcl_GetByteArrayFromObj(oldValuePtr,
&oldlength);
+      newbytes = Tcl_GetByteArrayFromObj(newValuePtr,
&newlength);
+      /*
+       * Ok to to call SetByteArrayLength because we've
already checked
+       * if oldValuePtr is shared.
+       */
+      oldandnewbytes = Tcl_SetByteArrayLength(oldValuePtr,
+          oldlength+newlength);
+      memcpy(oldandnewbytes+(size_t)oldlength, newbytes,
(size_t)newlength);
+    } else {
+     Tcl_AppendObjToObj(oldValuePtr, newValuePtr);
+    }
+      }
  }
     } else if (newValuePtr != oldValuePtr) {
  /*

davidw added on 2001-07-06 18:30:09:
Logged In: YES 
user_id=240

Well, Tcl_AppendObjToObj actually seems to work ok.  I am
able to manipulate a jpeg with it with no corruption.  It
translates its bytes into UTF, whereas Tcl_AppendToObj does
not.  My quick fix is the following:

--- tclStringObj.c2001/05/15 21:30:461.21
+++ tclStringObj.c2001/07/06 11:22:56
@@ -979,7 +979,8 @@
 
 stringPtr = GET_STRING(objPtr);
     } else {
-
AppendUtfToUtfRep(objPtr, bytes, length);
+
Tcl_Obj *appendObj = Tcl_NewByteArrayObj(bytes, length);
+
Tcl_AppendObjToObj(objPtr, appendObj);
     }
 }

Maybe there is a more efficient way of transforming the
bytes to UTF?  Is it worthwhile making a special case in
order to speed up this operation? (i.e., check to see if
objptr is a bytearray here, and also in objtoobj).

davidw added on 2001-07-05 20:57:04:
Logged In: YES 
user_id=240

I can implement this, but I think someone from the core
might do well to look at the file as a whole.  It only talks
about strings in the comments at the top... Maybe a bit of
reorganization is necessary.

nijtmans added on 2001-07-05 19:27:40:
Logged In: YES 
user_id=61031

This should be easy to implement: Just check in the function
Tcl_AppendToObj if both objects are ByteArrays, and if so
just create a new ByteArray for the result. I'll approve
such kind of implementation, and I appreciate a patch
submission.

dkf added on 2001-06-28 15:25:33:
Logged In: YES 
user_id=79902

It doesn't make much sense to [concat] two bytearray objects
as that is a list operation (well, sort of; [concat]s horrid
in its actual definition.)  But [append] definitely should
operate efficiently with bytearrays.