Tcl Source Code

View Ticket
Login
Ticket UUID: 949905
Title: Unicode null characters on UTF-8 channels
Type: Bug Version: obsolete: 8.5a2
Submitter: rmax Created on: 2004-05-07 14:21:27
Subsystem: 44. UTF-8 Strings Assigned To: hobbs
Priority: 8 Severity:
Status: Closed Last Modified: 2004-07-15 02:16:42
Resolution: Fixed Closed By: hobbs
    Closed on: 2004-07-14 19:16:42
Description:
A Unicode null character (\u0000) comes out as \xc0\x80
instead of \x00 when it is being printed to a utf-8
encoded channel.

The attached patch for HEAD fixes this and doesn't
break any tests here. If the patch gets accepted I'll
create one for the 8.4 line as well.
User Comments: hobbs added on 2004-07-03 00:18:56:
Logged In: YES 
user_id=72656

This is fixed, and left open as a watch item for the next
releases (8.4.7 and 8.5a2).

rmax added on 2004-05-27 21:36:05:
Logged In: YES 
user_id=124643

Comitted a fix to HEAD, and core-8-4-branch.

rmax added on 2004-05-08 02:52:29:
Logged In: YES 
user_id=124643

Forget that comment about the buffer overflow. I had
overlooked, that the size of the dest buffer gets adjusted,
so that alyways room for at least one more UTF-8 character.

rmax added on 2004-05-08 02:49:17:

File Added - 86449: utf-8-nullbyte.patch

Logged In: YES 
user_id=124643

The logic was not quite right before, so here is another
(hopefully the last) attempt.

The only thing I am not sure about is whether this patch
adds the risk of a buffer overflow by one byte when in input
mode there is only one spare byte left in the destination
buffer and the next character in the source buffer is a null
byte.

rmax added on 2004-05-08 01:19:13:

File Added - 86438: utf-8-nullbyte.patch

Logged In: YES 
user_id=124643

Another version of the patch - this time with tests and
comments.
I'll commit this one to HEAD to get wider testing.

rmax added on 2004-05-07 22:58:37:

File Deleted - 86410:

rmax added on 2004-05-07 22:57:48:

File Added - 86421: utf-8-nullbyte.patch

Logged In: YES 
user_id=124643

While doing further tests, I found, that null bytes that are
read from a utf-8 encoded channel also don't get translated
to \xc0\x80. The updated patch fixes that as well.

rmax added on 2004-05-07 21:22:04:

File Added - 86410: utf-8-nullbyte.patch

Attachments: