Tcl Source Code

View Ticket
Login
Ticket UUID: 1122671
Title: SEGV in Tcl_UtfToUniChar with unaligned string
Type: Bug Version: obsolete: 8.4.1
Submitter: nobody Created on: 2005-02-14 20:41:18
Subsystem: 44. UTF-8 Strings Assigned To: hobbs
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2007-02-08 15:57:02
Resolution: Fixed Closed By: hobbs
    Closed on: 2006-10-05 21:28:41
Description:
I'm seeing a SEGV on 64bit Solaris (sparc) machines due
to casting dst as a Tcl_UniChar* and it not being
2-byte aligned.

The pointer was allocated in FontMapLoadPage (char
buf[16]).

=>[1] Tcl_UtfToUniChar(str = 0xffffffff7fff3ac1
"\xc0\x80^A", chPtr = 0xffffffff7fff3ab1), line 321 in
"tclUtf.c"
  [2] UtfToUcs2beProc(clientData = (nil), src =
0xffffffff7fff3ac1 "\xc0\x80^A", srcLen = 2, flags = 7,
statePtr = 0xffffffff7fff38f8, dst = 0xffffffff7fff3ab1
"", dstLen = 14, srcReadPtr = 0xffffffff7fff3908,
dstWrotePtr = 0xffffffff7fff3904, dstCharsPtr =
0xffffffff7fff3900), line 616 in "tkUnixFont.c"
  [3] Tcl_UtfToExternal(interp = (nil), encoding =
0x105f85490, src = 0xffffffff7fff3ac1 "\xc0\x80^A",
srcLen = 2, flags = 7, statePtr = 0xffffffff7fff38f8,
dst = 0xffffffff7fff3ab1 "", dstLen = 14, srcReadPtr =
0xffffffff7fff3908, dstWrotePtr = 0xffffffff7fff3904,
dstCharsPtr = 0xffffffff7fff3900), line 1081 in
"tclEncoding.c"
  [4] FontMapLoadPage(subFontPtr = 0x106ec2860, row =
0), line 2097 in "tkUnixFont.c"

The SEGV lines (tclUtf.c:308):

if ((str[1] & 0xC0) == 0x80) {
    /*
     * Two-byte-character lead-byte followed by a
trail-byte.
     */

    *chPtr = (Tcl_UniChar) (((byte & 0x1F) << 6) |
(str[1] & 0x3F));
    return 2;
}

(chPtr is 1-byte aligned)

The allocation (tkUnixFont.c:2056):

    char src[TCL_UTF_MAX], buf[16];

(buf is passed to Tcl_UtfToExternal as the dst pointer
and is only 1-byte aligned due to being stack allocated
as a char array).

Jeff
User Comments: nobody added on 2007-02-08 15:57:02:
Logged In: NO 

If TCL_UTF_MAX values greater than 3 will now be totally broken by this
bug fix, the build should fail if TCL_UTF_MAX is set to 4 or larger.

But it was not necessary to add such an assumption to fix this bug.
Something as simple as:

         memcpy(dst, &ch, sizeof(Tcl_UniChar));
         dst += sizeof(Tcl_UniChar);

would have worked just fine.

The compiler knows the alignment of the respective types, and since this memcpy
is so small, the compiler can trivially expand this inline.  So there are
no performance regression considerations with fixing the bug this way either.

hobbs added on 2006-10-06 12:00:38:
Logged In: YES 
user_id=72656

The following additional patch was necessary in the Tcl part
to satisfy some quick of big endian systems ...

@@ -2176,7 +2176,7 @@
         */
        ch = *(Tcl_UniChar *)src;
        if (ch && ch < 0x80) {
-           *dst++ = *src;
+           *dst++ = (ch & 0xFF);
        } else {
            dst += Tcl_UniCharToUtf(ch, dst);
        }

hobbs added on 2006-10-06 04:28:41:
Logged In: YES 
user_id=72656

Applied for 8.4.14 and 8.5a5.

hobbs added on 2006-10-06 04:21:14:

File Added - 196776: 1122671-tcl85.patch

Logged In: YES 
user_id=72656

and one more for Tcl 8.5a5.

hobbs added on 2006-10-06 03:10:27:

File Added - 196757: 1122671-tcl84.patch

Logged In: YES 
user_id=72656

Attached an updated patch for 8.4.14 for Tcl that handles
UnicodeToUtfProc as well.

hobbs added on 2006-10-06 00:23:24:

File Added - 196742: 1122671-tk85.patch

Logged In: YES 
user_id=72656

Updated 8.5a5 Tk tkUnixFont.c patch to deal with extra case
of possible misalignment.

hobbs added on 2006-10-06 00:22:36:

File Added - 196741: 1122671-tk84.patch

Logged In: YES 
user_id=72656

Attached is an updated patch for 8.4 tkUnixFont.c to deal
with another case.

dgp added on 2006-03-10 12:01:10:
Logged In: YES 
user_id=80530


hmmm... verified fix more
than a year ago, but still
open?  What's the status here?
Let's resolve for 8.4.13

jtbrubaker added on 2005-02-23 07:02:11:
Logged In: YES 
user_id=1219284

I tried both approaches - the patches (look good, btw) and
only changing the declaration order without any other
changes.  Both fix the bus error, although I prefer the
patch approach.  :)

I tried two separate machines without any changes and then
with each.  Only one would trigger the bus error but it was
reliable enough to convince me that the issue is resolved.

Thanks!

Jeff

hobbs added on 2005-02-23 01:01:58:
Logged In: YES 
user_id=72656

I would also be interested if merely swapping the decl
ordering to:
    char buf[16], src[TCL_UTF_MAX];
solves the issue (it should, being aligned by decl).  I
would do this as well as a primary safe-guard, as well as
looking into the larger fixes, but I wanted to confirm just
that one fix would be "enough" in this case.

jtbrubaker added on 2005-02-22 23:04:40:
Logged In: YES 
user_id=1219284

Sorry, I've been out of town (business, not pleasure,
unfortunately).  I'll try to take a look at it today.  Thanks.

Jeff

hobbs added on 2005-02-17 08:20:50:
Logged In: YES 
user_id=72656

jtbrubaker - any chance you can verify the fix?

hobbs added on 2005-02-16 08:37:04:
Logged In: YES 
user_id=72656

One fix to the 2 patches that turned up on Windows (but not
unix, oddly enough), is that dstEnd must be:

    dstEnd   = dst + dstLen - sizeof(Tcl_UniChar);

for both patches.

hobbs added on 2005-02-16 07:38:53:

File Added - 120190: 1122671-tk.patch

Logged In: YES 
user_id=72656

And attached here is the proposed patch for Tk.  This
actually defers the ucs-2be to be 'unicode' on BE machines.
 This is just the tkUnixFont.c patch - configure.in also
adds AC_C_BIGENDIAN (like in tcl's configure.in) to get
WORDS_BIGENDIAN defined on BE machines.  You can apply this
patch whether you add that to configure.in or not.  Please
test with both patches.  I'm actually amazed that you never
hit the aligment issue with tclEncoding.c:UtfToUnicodeProc,
which should be called more often.

hobbs added on 2005-02-16 07:34:16:

File Added - 120188: 1122671-tcl.patch

hobbs added on 2005-02-16 07:34:15:
Logged In: YES 
user_id=72656

The problem needs to more specifically address callers of
Tcl_UtfToUniChar, to make sure they are properly passing in
data.  Attached is a patch to Tcl that addresses the same
problem as found in UtfToUnicodeProc.  There is a problem in
that it hard-codes the sizeof(Tcl_UniChar) == 2 assumption,
which is a Bad Idea.

hobbs added on 2005-02-15 08:34:11:
Logged In: YES 
user_id=72656

The X server has something to do with it in this case
because you travelled through UtfToUcs2beProc, which is for
big endian encoding conversion in X, managed by the font
encoding.

In any case, it would be safest to handle it in
Tcl_UtfToUniChar, but that is a time-sensitive function. 
There is also TclUtfToUniChar that may need similar
treatment in generic/tclInt.h.

jtbrubaker added on 2005-02-15 06:33:28:
Logged In: YES 
user_id=1219284

I'm not sure the X server has anything to do with this.  I
would expect the compiler to be responsible for stack order,
but it might be system specific.

As for the alignment question - don't you think it's better
handled in the low level function rather than ensuring all
callers know to avoid char arrays?

Jeff

hobbs added on 2005-02-15 06:08:26:
Logged In: YES 
user_id=72656

I didn't imply shelving it, just needed more info to repro.
 I suspect a BE X server is necessary as well.  In any case,
your thoughts mirror mine as to the recommended fix.  Since
you can repro, can you try just changing 'buf' to
Tcl_UniChar buf[8], and then appropriately cast it to char
in the one spot, with *sizeof(Tcl_UniChar) for the sizeof
part?  If that works, I'll apply that (please confirm changes).

jtbrubaker added on 2005-02-15 05:19:07:
Logged In: YES 
user_id=1219284

While difficult to reproduce, this does not mean that the
bug should be shelved.

In order to reproduce reliably, your stack must allocate
FontMapLoadPage's buf array starting on an odd byte, on a
platform that cares about byte alignment.  I cannot provide
a testcase that does this for you since it depends on too
many factors.

From the debugger, it is clear that in my session, buf ended
up on an odd byte (0xffffffff7fff3ab1).  Since
Tcl_UtfToUniChar() thinks it's looking at an array of 2-byte
elements, dereferencing any element within this array will
cause a bus error (sorry, I think I said SEGV earlier).

I plan on resolving this locally by using a temporary
Tcl_UniChar variable and memcpying it back to the 'chPtr'
pointer.  It would also be sufficient to change 'char
buf[16]' in FontMapLoadPage to a Tcl_UniChar array, which
will always be aligned properly.

Jeff

hobbs added on 2005-02-15 04:53:13:
Logged In: YES 
user_id=72656

There is insufficient information to duplicate this report.
 I built 64-bit sparc (using Forte C) and displayed to a
SuSE 9.2 linux box, and was able to display the chars \0 \1
and \2 without any trouble using the font 'unifont' (that
has a rep for the NULL char).

This needs more info to reproduce.

jtbrubaker added on 2005-02-15 04:11:57:
Logged In: YES 
user_id=1219284

I have not tried a more recent version, sorry.

(dbx) p &src[0]
&src[0] = 0xffffffff7fff3ac1 "\xc0\x80^A"

Jeff

hobbs added on 2005-02-15 04:01:07:
Logged In: YES 
user_id=72656

What was the full contents of the data being processed
(displayed)?  Also, I see that you found this in 8.4.1.  Has
this been tried on a recent version?

jtbrubaker added on 2005-02-15 03:56:00:
Logged In: YES 
user_id=1219284

I was not logged in when submitting this bug.  My userid is
'jtbrubaker' if further input is needed.

Attachments: