Ticket UUID: | 949905 | |||
Title: | Unicode null characters on UTF-8 channels | |||
Type: | Bug | Version: | obsolete: 8.5a2 | |
Submitter: | rmax | Created on: | 2004-05-07 14:21:27 | |
Subsystem: | 44. UTF-8 Strings | Assigned To: | hobbs | |
Priority: | 8 | Severity: | ||
Status: | Closed | Last Modified: | 2004-07-15 02:16:42 | |
Resolution: | Fixed | Closed By: | hobbs | |
Closed on: | 2004-07-14 19:16:42 | |||
Description: |
A Unicode null character (\u0000) comes out as \xc0\x80 instead of \x00 when it is being printed to a utf-8 encoded channel. The attached patch for HEAD fixes this and doesn't break any tests here. If the patch gets accepted I'll create one for the 8.4 line as well. | |||
User Comments: |
hobbs added on 2004-07-03 00:18:56:
Logged In: YES user_id=72656 This is fixed, and left open as a watch item for the next releases (8.4.7 and 8.5a2). rmax added on 2004-05-27 21:36:05: Logged In: YES user_id=124643 Comitted a fix to HEAD, and core-8-4-branch. rmax added on 2004-05-08 02:52:29: Logged In: YES user_id=124643 Forget that comment about the buffer overflow. I had overlooked, that the size of the dest buffer gets adjusted, so that alyways room for at least one more UTF-8 character. rmax added on 2004-05-08 02:49:17: File Added - 86449: utf-8-nullbyte.patch Logged In: YES user_id=124643 The logic was not quite right before, so here is another (hopefully the last) attempt. The only thing I am not sure about is whether this patch adds the risk of a buffer overflow by one byte when in input mode there is only one spare byte left in the destination buffer and the next character in the source buffer is a null byte. rmax added on 2004-05-08 01:19:13: File Added - 86438: utf-8-nullbyte.patch Logged In: YES user_id=124643 Another version of the patch - this time with tests and comments. I'll commit this one to HEAD to get wider testing. rmax added on 2004-05-07 22:58:37: File Deleted - 86410: rmax added on 2004-05-07 22:57:48: File Added - 86421: utf-8-nullbyte.patch Logged In: YES user_id=124643 While doing further tests, I found, that null bytes that are read from a utf-8 encoded channel also don't get translated to \xc0\x80. The updated patch fixes that as well. rmax added on 2004-05-07 21:22:04: File Added - 86410: utf-8-nullbyte.patch |
Attachments:
- utf-8-nullbyte.patch [download] added by rmax on 2004-05-08 02:49:17. [details]