Tcl Source Code

View Ticket
Login
Ticket UUID: 3069278
Title: breakage on head Windows triggered by install-tzdata
Type: Bug Version: obsolete: 8.6b1.1
Submitter: hobbs Created on: 2010-09-17 18:04:10
Subsystem: 37. File System Assigned To: nijtmans
Priority: 8 Severity:
Status: Closed Last Modified: 2010-09-22 04:52:58
Resolution: Fixed Closed By: nijtmans
    Closed on: 2010-09-21 21:52:58
Description:
Previously recorded at 3065455, we have found that there is a general issue newly introduced on head on Windows, I suspect related to the -DUNICODE work.

You can trigger this bug with install-tzdata, but only when you use a prefix path of sufficient length.  This one in particular worked for me:

prefix= C:/Tcl-test/we-want-a-long-path/and-this-makes-it-longer/dont-you-know-longer-still/yes-this-will-be-long-enough
exec_prefix= C:/Tcl-test/we-want-a-long-path/and-this-makes-it-longer/dont-you-know-longer-still/yes-this-will-be-long-enough
bindir= ${exec_prefix}/bin
libdir= C:/Tcl-test/we-want-a-long-path/and-this-makes-it-longer/dont-you-know-longer-still/yes-this-will-be-long-enough/lib

aka --prefix=C:/Tcl-test/we-want-a-long-path/and-this-makes-it-longer/dont-you-know-longer-still/yes-this-will-be-long-enough when building.

So what's happening here?  I believe something in the switch from regular to -DUNICODE mode is now overflowing some aspect of a dstring (they have static space of 200b, and must grow otherwise).  Or perhaps a buffer overrun where 1b was appended where 1*sizeof(WCHAR) is needed now.  In any case, the crash symptoms indicate dstring related smashing, which is of course used all over Windows file handling.  The -DUNICODE will have distinctly affected this behavior as well.
User Comments: nijtmans added on 2010-09-22 04:52:58:

allow_comments - 1

Fixed on HEAD. Thanks for the big help, Andreas!

nijtmans added on 2010-09-21 18:47:32:
I worked through the code, trying to find locations where UNICODE
strings were appended to a buffer, trying to let is end with a double
null-byte. In most places, this is done by using
Tcl_DStringSetLength(.... , len+1);
Tcl_DStringSetLength(.... , len);
The first call allocates enough space to accomodate for the extra
null byte, The second call sets the length correctly, and puts
the principal null-byte there.

However, there were two places where it was done differently.
In tclWinFCmd.c, line 1307:
 nativeSource[oldSourceLen + 1] = '\0';
If oldSourceLen is the string length in bytes, and
nativeSource is a WCHAR array, this will set two
null-bytes in the string, but at a double index
of the intended location. The UNICODE string
will not be null-terminated, which explains
exactly Andreas' stack trace.

Another location was at tclWinPipe.c, line 3136:
 *(WCHAR *)(namePtr + Tcl_DStringLength(&buf) + 1) = '\0';
Not that bad, but casting a char * to a WCHAR *, while
not even being sure that the pointer is aligned to
a two-byte boundry, is clearly not correct.

Here is the patch, which changed both locations to
use the double Tcl_DStringSetLength method. This
should - once and for all - fix this.

Andreas  (or Jeff), could you please confirm that this
fixes the bug? Thanks!

Originally, I intended to modify tclUtil.c such that
on Windows two null-bytes are always guaranteed.
However, there is a lot of code which assumes
insertion of a single null-byte, so this lead to
test failures.

nijtmans added on 2010-09-21 18:32:20:

File Added - 387374: 3069278.patch

nijtmans added on 2010-09-18 12:37:47:
I suspect the problematic line is:
nativeSource[oldSourceLen + 1] = '\0';
because if nativeSource is a TCHAR, then
this does something different than expected.

However, in order to quickly fix the HEAD
for now just compile this file without -DUNCODE
(there are more files like that, which still need
to be investigated, so that doesn' t hurt)

Therefore, keeping open until it is fully fixed.

andreas_kupries added on 2010-09-18 02:30:09:
After rewriting the DString functions to have Debug variants (like we have for Tcl_Bj machinery) I got a good trace of DString actions. The attached log is actually a reduced form which looks only actions on the DString which crashed with high guard error in DStringFree.

Investigating the locations recorded it seems that the crash is in
"TclpObjRemoveDirectory" lines 1007-1014, with "TraverseWinTree" (*)
the main suspect doing the mem smash. The DString is the native
representation of some path.

Call sequence
    TclpObjRemoveDirectory
    -->    DoRemoveDirectory (1)
        -->    TraverseWinTree

    The "DoRemoveJustDirectory" called by (1) seems to only read
    this DString (sees it only as a TCHAR* pointer).

Regardless, this seems to be the area to look into.

andreas_kupries added on 2010-09-18 02:28:00:

File Added - 386977: LOG2

hobbs added on 2010-09-18 01:10:49:
BTW, this is heisenbug like, and does need multiple runs to confirm.

hobbs added on 2010-09-18 01:05:29:
One crash stack, which crashed in a mem debug build with "unable to alloc 306 bytes, z:/cvs/tcl/tcl8.6/win/tclWinFile.c line 3232".  Of course I have enough mem, but some mem smashing must be occurring.

 tcl86g.dll!Tcl_DbCkalloc(unsigned int size=306, const char * file=0x1018425c, int line=3232)  Line 393 + 0x16 bytesC
>tcl86g.dll!TclNativeCreateNativeRep(Tcl_Obj * pathPtr=0x023d6310)  Line 3232 + 0x13 bytesC
 tcl86g.dll!Tcl_FSGetInternalRep(Tcl_Obj * pathPtr=0x023d6310, const Tcl_Filesystem * fsPtr=0x10132c38)  Line 2160 + 0x7 bytesC
 tcl86g.dll!Tcl_FSGetNativePath(Tcl_Obj * pathPtr=0x023d6310)  Line 4571 + 0xe bytesC
 tcl86g.dll!TclpObjStat(Tcl_Obj * pathPtr=0x023d6310, _stat64 * statPtr=0x0017ebd4)  Line 1985 + 0xf bytesC
 tcl86g.dll!Tcl_FSStat(Tcl_Obj * pathPtr=0x023d6310, _stat64 * buf=0x0017ebd4)  Line 2034 + 0x10 bytesC
 tcl86g.dll!FileCopyRename(Tcl_Interp * interp=0x023004d8, int objc=5, Tcl_Obj * const * objv=0x023015e8, int copyFlag=1)  Line 147 + 0xd bytesC

Attachments: