Ticket UUID: | 1122671 | |||
Title: | SEGV in Tcl_UtfToUniChar with unaligned string | |||
Type: | Bug | Version: | obsolete: 8.4.1 | |
Submitter: | nobody | Created on: | 2005-02-14 20:41:18 | |
Subsystem: | 44. UTF-8 Strings | Assigned To: | hobbs | |
Priority: | 9 Immediate | Severity: | ||
Status: | Closed | Last Modified: | 2007-02-08 15:57:02 | |
Resolution: | Fixed | Closed By: | hobbs | |
Closed on: | 2006-10-05 21:28:41 | |||
Description: |
I'm seeing a SEGV on 64bit Solaris (sparc) machines due to casting dst as a Tcl_UniChar* and it not being 2-byte aligned. The pointer was allocated in FontMapLoadPage (char buf[16]). =>[1] Tcl_UtfToUniChar(str = 0xffffffff7fff3ac1 "\xc0\x80^A", chPtr = 0xffffffff7fff3ab1), line 321 in "tclUtf.c" [2] UtfToUcs2beProc(clientData = (nil), src = 0xffffffff7fff3ac1 "\xc0\x80^A", srcLen = 2, flags = 7, statePtr = 0xffffffff7fff38f8, dst = 0xffffffff7fff3ab1 "", dstLen = 14, srcReadPtr = 0xffffffff7fff3908, dstWrotePtr = 0xffffffff7fff3904, dstCharsPtr = 0xffffffff7fff3900), line 616 in "tkUnixFont.c" [3] Tcl_UtfToExternal(interp = (nil), encoding = 0x105f85490, src = 0xffffffff7fff3ac1 "\xc0\x80^A", srcLen = 2, flags = 7, statePtr = 0xffffffff7fff38f8, dst = 0xffffffff7fff3ab1 "", dstLen = 14, srcReadPtr = 0xffffffff7fff3908, dstWrotePtr = 0xffffffff7fff3904, dstCharsPtr = 0xffffffff7fff3900), line 1081 in "tclEncoding.c" [4] FontMapLoadPage(subFontPtr = 0x106ec2860, row = 0), line 2097 in "tkUnixFont.c" The SEGV lines (tclUtf.c:308): if ((str[1] & 0xC0) == 0x80) { /* * Two-byte-character lead-byte followed by a trail-byte. */ *chPtr = (Tcl_UniChar) (((byte & 0x1F) << 6) | (str[1] & 0x3F)); return 2; } (chPtr is 1-byte aligned) The allocation (tkUnixFont.c:2056): char src[TCL_UTF_MAX], buf[16]; (buf is passed to Tcl_UtfToExternal as the dst pointer and is only 1-byte aligned due to being stack allocated as a char array). Jeff | |||
User Comments: |
nobody added on 2007-02-08 15:57:02:
Logged In: NO If TCL_UTF_MAX values greater than 3 will now be totally broken by this bug fix, the build should fail if TCL_UTF_MAX is set to 4 or larger. But it was not necessary to add such an assumption to fix this bug. Something as simple as: memcpy(dst, &ch, sizeof(Tcl_UniChar)); dst += sizeof(Tcl_UniChar); would have worked just fine. The compiler knows the alignment of the respective types, and since this memcpy is so small, the compiler can trivially expand this inline. So there are no performance regression considerations with fixing the bug this way either. hobbs added on 2006-10-06 12:00:38: Logged In: YES user_id=72656 The following additional patch was necessary in the Tcl part to satisfy some quick of big endian systems ... @@ -2176,7 +2176,7 @@ */ ch = *(Tcl_UniChar *)src; if (ch && ch < 0x80) { - *dst++ = *src; + *dst++ = (ch & 0xFF); } else { dst += Tcl_UniCharToUtf(ch, dst); } hobbs added on 2006-10-06 04:28:41: Logged In: YES user_id=72656 Applied for 8.4.14 and 8.5a5. hobbs added on 2006-10-06 04:21:14: File Added - 196776: 1122671-tcl85.patch Logged In: YES user_id=72656 and one more for Tcl 8.5a5. hobbs added on 2006-10-06 03:10:27: File Added - 196757: 1122671-tcl84.patch Logged In: YES user_id=72656 Attached an updated patch for 8.4.14 for Tcl that handles UnicodeToUtfProc as well. hobbs added on 2006-10-06 00:23:24: File Added - 196742: 1122671-tk85.patch Logged In: YES user_id=72656 Updated 8.5a5 Tk tkUnixFont.c patch to deal with extra case of possible misalignment. hobbs added on 2006-10-06 00:22:36: File Added - 196741: 1122671-tk84.patch Logged In: YES user_id=72656 Attached is an updated patch for 8.4 tkUnixFont.c to deal with another case. dgp added on 2006-03-10 12:01:10: Logged In: YES user_id=80530 hmmm... verified fix more than a year ago, but still open? What's the status here? Let's resolve for 8.4.13 jtbrubaker added on 2005-02-23 07:02:11: Logged In: YES user_id=1219284 I tried both approaches - the patches (look good, btw) and only changing the declaration order without any other changes. Both fix the bus error, although I prefer the patch approach. :) I tried two separate machines without any changes and then with each. Only one would trigger the bus error but it was reliable enough to convince me that the issue is resolved. Thanks! Jeff hobbs added on 2005-02-23 01:01:58: Logged In: YES user_id=72656 I would also be interested if merely swapping the decl ordering to: char buf[16], src[TCL_UTF_MAX]; solves the issue (it should, being aligned by decl). I would do this as well as a primary safe-guard, as well as looking into the larger fixes, but I wanted to confirm just that one fix would be "enough" in this case. jtbrubaker added on 2005-02-22 23:04:40: Logged In: YES user_id=1219284 Sorry, I've been out of town (business, not pleasure, unfortunately). I'll try to take a look at it today. Thanks. Jeff hobbs added on 2005-02-17 08:20:50: Logged In: YES user_id=72656 jtbrubaker - any chance you can verify the fix? hobbs added on 2005-02-16 08:37:04: Logged In: YES user_id=72656 One fix to the 2 patches that turned up on Windows (but not unix, oddly enough), is that dstEnd must be: dstEnd = dst + dstLen - sizeof(Tcl_UniChar); for both patches. hobbs added on 2005-02-16 07:38:53: File Added - 120190: 1122671-tk.patch Logged In: YES user_id=72656 And attached here is the proposed patch for Tk. This actually defers the ucs-2be to be 'unicode' on BE machines. This is just the tkUnixFont.c patch - configure.in also adds AC_C_BIGENDIAN (like in tcl's configure.in) to get WORDS_BIGENDIAN defined on BE machines. You can apply this patch whether you add that to configure.in or not. Please test with both patches. I'm actually amazed that you never hit the aligment issue with tclEncoding.c:UtfToUnicodeProc, which should be called more often. hobbs added on 2005-02-16 07:34:16: File Added - 120188: 1122671-tcl.patch hobbs added on 2005-02-16 07:34:15: Logged In: YES user_id=72656 The problem needs to more specifically address callers of Tcl_UtfToUniChar, to make sure they are properly passing in data. Attached is a patch to Tcl that addresses the same problem as found in UtfToUnicodeProc. There is a problem in that it hard-codes the sizeof(Tcl_UniChar) == 2 assumption, which is a Bad Idea. hobbs added on 2005-02-15 08:34:11: Logged In: YES user_id=72656 The X server has something to do with it in this case because you travelled through UtfToUcs2beProc, which is for big endian encoding conversion in X, managed by the font encoding. In any case, it would be safest to handle it in Tcl_UtfToUniChar, but that is a time-sensitive function. There is also TclUtfToUniChar that may need similar treatment in generic/tclInt.h. jtbrubaker added on 2005-02-15 06:33:28: Logged In: YES user_id=1219284 I'm not sure the X server has anything to do with this. I would expect the compiler to be responsible for stack order, but it might be system specific. As for the alignment question - don't you think it's better handled in the low level function rather than ensuring all callers know to avoid char arrays? Jeff hobbs added on 2005-02-15 06:08:26: Logged In: YES user_id=72656 I didn't imply shelving it, just needed more info to repro. I suspect a BE X server is necessary as well. In any case, your thoughts mirror mine as to the recommended fix. Since you can repro, can you try just changing 'buf' to Tcl_UniChar buf[8], and then appropriately cast it to char in the one spot, with *sizeof(Tcl_UniChar) for the sizeof part? If that works, I'll apply that (please confirm changes). jtbrubaker added on 2005-02-15 05:19:07: Logged In: YES user_id=1219284 While difficult to reproduce, this does not mean that the bug should be shelved. In order to reproduce reliably, your stack must allocate FontMapLoadPage's buf array starting on an odd byte, on a platform that cares about byte alignment. I cannot provide a testcase that does this for you since it depends on too many factors. From the debugger, it is clear that in my session, buf ended up on an odd byte (0xffffffff7fff3ab1). Since Tcl_UtfToUniChar() thinks it's looking at an array of 2-byte elements, dereferencing any element within this array will cause a bus error (sorry, I think I said SEGV earlier). I plan on resolving this locally by using a temporary Tcl_UniChar variable and memcpying it back to the 'chPtr' pointer. It would also be sufficient to change 'char buf[16]' in FontMapLoadPage to a Tcl_UniChar array, which will always be aligned properly. Jeff hobbs added on 2005-02-15 04:53:13: Logged In: YES user_id=72656 There is insufficient information to duplicate this report. I built 64-bit sparc (using Forte C) and displayed to a SuSE 9.2 linux box, and was able to display the chars \0 \1 and \2 without any trouble using the font 'unifont' (that has a rep for the NULL char). This needs more info to reproduce. jtbrubaker added on 2005-02-15 04:11:57: Logged In: YES user_id=1219284 I have not tried a more recent version, sorry. (dbx) p &src[0] &src[0] = 0xffffffff7fff3ac1 "\xc0\x80^A" Jeff hobbs added on 2005-02-15 04:01:07: Logged In: YES user_id=72656 What was the full contents of the data being processed (displayed)? Also, I see that you found this in 8.4.1. Has this been tried on a recent version? jtbrubaker added on 2005-02-15 03:56:00: Logged In: YES user_id=1219284 I was not logged in when submitting this bug. My userid is 'jtbrubaker' if further input is needed. |
Attachments:
- 1122671-tcl85.patch [download] added by hobbs on 2006-10-06 04:21:14. [details]
- 1122671-tcl84.patch [download] added by hobbs on 2006-10-06 03:10:27. [details]
- 1122671-tk85.patch [download] added by hobbs on 2006-10-06 00:23:24. [details]
- 1122671-tk84.patch [download] added by hobbs on 2006-10-06 00:22:36. [details]
- 1122671-tk.patch [download] added by hobbs on 2005-02-16 07:38:53. [details]
- 1122671-tcl.patch [download] added by hobbs on 2005-02-16 07:34:16. [details]