Tcl Source Code: View Ticket

Ticket UUID:	949905
Title:	Unicode null characters on UTF-8 channels
Type:	Bug	Version:	obsolete: 8.5a2
Submitter:	rmax	Created on:	2004-05-07 14:21:27
Subsystem:	44. UTF-8 Strings	Assigned To:	hobbs
Priority:	8	Severity:
Status:	Closed	Last Modified:	2004-07-15 02:16:42
Resolution:	Fixed	Closed By:	hobbs
		Closed on:	2004-07-14 19:16:42
Description:	A Unicode null character (\u0000) comes out as \xc0\x80 instead of \x00 when it is being printed to a utf-8 encoded channel. The attached patch for HEAD fixes this and doesn't break any tests here. If the patch gets accepted I'll create one for the 8.4 line as well.
User Comments:	hobbs added on 2004-07-03 00:18:56: Logged In: YES user_id=72656 This is fixed, and left open as a watch item for the next releases (8.4.7 and 8.5a2). rmax added on 2004-05-27 21:36:05: Logged In: YES user_id=124643 Comitted a fix to HEAD, and core-8-4-branch. rmax added on 2004-05-08 02:52:29: Logged In: YES user_id=124643 Forget that comment about the buffer overflow. I had overlooked, that the size of the dest buffer gets adjusted, so that alyways room for at least one more UTF-8 character. rmax added on 2004-05-08 02:49:17: File Added - 86449: utf-8-nullbyte.patch Logged In: YES user_id=124643 The logic was not quite right before, so here is another (hopefully the last) attempt. The only thing I am not sure about is whether this patch adds the risk of a buffer overflow by one byte when in input mode there is only one spare byte left in the destination buffer and the next character in the source buffer is a null byte. rmax added on 2004-05-08 01:19:13: File Added - 86438: utf-8-nullbyte.patch Logged In: YES user_id=124643 Another version of the patch - this time with tests and comments. I'll commit this one to HEAD to get wider testing. rmax added on 2004-05-07 22:58:37: File Deleted - 86410: rmax added on 2004-05-07 22:57:48: File Added - 86421: utf-8-nullbyte.patch Logged In: YES user_id=124643 While doing further tests, I found, that null bytes that are read from a utf-8 encoded channel also don't get translated to \xc0\x80. The updated patch fixes that as well. rmax added on 2004-05-07 21:22:04: File Added - 86410: utf-8-nullbyte.patch

Attachments:

utf-8-nullbyte.patch [download] added by rmax on 2004-05-08 02:49:17. [details]