Tcl Source Code

View Ticket
Login
Ticket UUID: 3598300
Title: unix: tcl.h does not include sys/stat.h
Type: Bug Version: obsolete: 8.5.13
Submitter: roflz Created on: 2012-12-24 08:14:41
Subsystem: 52. Portability Support Assigned To: dkf
Priority: 5 Medium Severity:
Status: Open Last Modified: 2013-01-20 05:36:56
Resolution: None Closed By: dkf
    Closed on: 2013-01-17 01:09:02
Description:
tcl.h typedefs Tcl_StatBuf to struct stat/stat64 but does not include <sys/stat.h>

this leads to compilation error in tclCmdAH.c because the storage size of variables of type Tcl_StatBuf is unknown.
User Comments: dkf added on 2013-01-20 05:36:56:
OSX Leopard definitely has stat64 as well as stat. The stat64 version is interesting because it knows the creation instant for the file as well as the traditional three POSIX times.

I have no idea what later versions of OSX support. We don't support earlier versions of OSX for sure (especially with Tk).

nijtmans added on 2013-01-20 01:56:11:
Donal, could you try branch "novem-bug-3598300" as well? I suspect that on
OSX the only reason for not defining HAVE_STRUCT_STAT64 was because
of binary compatibility with older OSX releases, which didn't have stat64 yet.
If so, then for "novem" we can break binary compatibility and enable
HAVE_STRUCT_STAT64 (or - better- introduce HAVE_NO_STRUCT_STAT64)

For Win32, the same could be done, but that would mean
that VC6 must be phased out: it doesn't have a stat64-like
structure which has 64-bit file sizes, so that cannot be done
without a TIP (which I would approve, BTW)...

Donal, does this approach work on your OSX machines?
Your opinion please, before merging this to "novem"

dkf added on 2013-01-18 22:25:19:
[For reference: the LoadDyld warnings relate to how we do load-from-VFS on OSX; the API that we use to do it has been deprecated by Apple in favor of an API which can't support the functionality. So we put up with the warnings. I've got a "fix" on one of the novem branches that silences the warnings, but it's truly not worth us trying to backport.]

dkf added on 2013-01-18 22:22:00:
That branch builds clean on OSX Leopard (excepting known-unrelated issues in tclLoadDyld.c which should *not* be addressed) and passes --enable-symbols=mem test run with no failures.

Looks good.

nijtmans added on 2013-01-18 22:11:20:
Donal, could you try the current "bug-3598300" branch,
(based on core-8-5-branch) ?

The logic is actually simple. In tcl.h:
#if defined(__APPLE__)
...
#   undef HAVE_STRUCT_STAT64
#endif

This #undef comes too late to be usable in tclUnixPort.h, that's the problem!

This means that on __APPLE__ we simply should not
use stat64, even though HAVE_STRUCT_STAT64 is
defined. Better would be to change the configure
script such that if __APPLE__ is defined that then
HAVE_STRUCT_STAT64 is NEVER defined. But
then TEA should be adapted as well.

dkf added on 2013-01-17 20:56:30:

IP - Comment Removed: 130.88.99.220

dkf added on 2013-01-17 20:56:16:
Hmm, and right now there's logic in tcl.h that makes sure that OSX picks up the correct stat structure. It's stuff that would belong in configure, except that OSX builds (of Leopard vintage at least) are actually multi-architecture builds. Messy.

Perhaps we should somehow duplicate the logic for deciding which stat API to use so that the include order can be in one direction on all platforms.

And for 9.0? NO MORE USE OF 'struct stat' IN TCL's PUBLIC API!

dkf added on 2013-01-17 20:56:09:
Hmm, and right now there's logic in tcl.h that makes sure that OSX picks up the correct stat structure. It's stuff that would belong in configure, except that OSX builds (of Leopard vintage at least) are actually multi-architecture builds. Messy.

Perhaps we should somehow duplicate the logic for deciding which stat API to use so that the include order can be in one direction on all platforms.

And for 8.6? NO MORE USE OF 'struct stat' IN TCL's PUBLIC API!

twylite added on 2013-01-17 20:50:05:
dkf: Yes, there is (a reason why tclWinPort.h is before tcl.h in tclPort.h).  It's to make sure that _USE_32BIT_TIME_T is defined before you include tcl.h, so that the tcl.h handling of statbuf gives the correct definition:

#if defined(__WIN32__)
#   ifdef __BORLANDC__
typedef struct stati64 Tcl_StatBuf;
#   elif defined(_WIN64)
typedef struct __stat64 Tcl_StatBuf;
#   elif (defined(_MSC_VER) && (_MSC_VER < 1400)) || defined(_USE_32BIT_TIME_T)
typedef struct _stati64Tcl_StatBuf;
#   else
typedef struct _stat32i64 Tcl_StatBuf;
#   endif /* _MSC_VER < 1400 */

nijtmans added on 2013-01-17 20:47:19:
Both tclWinPort.h and tclUnixPort.h should be before tcl.h, so if e.g.<sys/stat.h>
contains the following fragment:
    ==== <sys/stat.h> ====
    #define stat stat64
    ==== end fragment ====
then tcl.h will pick it up on the Tcl_StatBuf definition.

Is HAVE_STRUCT_STAT64 defined on OSX? If not, that could be the bug.

ferrieux added on 2013-01-17 20:34:23:
Hum, the way I understand it, it is the other way around.

Reason: see comment by roflz in this thread at Date: 2012-12-24 17:30:21 PST

dkf added on 2013-01-17 20:31:14:
BTW, is there a reason why tclWinPort.h is before tcl.h in tclPort.h?

dkf added on 2013-01-17 18:37:06:
I'm happy to close my branch; I just wanted to get recorded exactly what I meant without any other changes. :-)

nijtmans added on 2013-01-17 17:23:27:
>The branch minimal-fix-for-3598300-problems
That's effectively the same as what I did on core-8-5-branch and trunk ;-(

dkf added on 2013-01-17 16:14:32:
The branch minimal-fix-for-3598300-problems is as small a fix for the problems as I could make; with it, I can run scripts and work interactively once more.

dkf added on 2013-01-17 15:58:19:
The problem was that in *some* circumstances (but not all), tcl.h is computing #defs that tclUnixPort.h requires to do the right thing w.r.t. picking up system structure definitions. I just happened to be on one of those platforms, which helped a lot for tracking the problem down. It looked a lot like the code was just going completely crazy though. :-) I spent quite a suspecting that the Tcl_Obj memory allocation system was at fault (it was innocent).

The crazy thing is that tclPort.h is usually a file I completely ignore, as I "know" that it just includes the correct platform-specific portability file. Which just goes to show that what I "know" is sometimes wrong…

nijtmans added on 2013-01-17 15:40:28:
This is serious enough to revert this on core-8-5-branch and trunk. Did that now, and placed a
include <sys/stat.h> before <tcl.h> in every source file that uses Tcl_StatBuf. That should
work a sufficient workaround for this bug, until this is investigated/worked out further.

Thanks Donal, assigning to you now, hoping you can shine some more light on it.

dkf added on 2013-01-17 08:09:02:
Problem commit identified as 8abba84224, appears to be due to swapping of order of include of tcl.h and tclUnixPort.h in tclPort.h (!!!!) which causes OSX Leopard (but not later versions) to get the wrong type of 'struct stat' used. The logic for that was gnarly (and hacked around quite a bit) but the key point is that there is a decision in tcl.h as to which stat API to use.

According to comments in the code, it's not done in configure so as to allow for support for fat binaries (which have multiple architectures in the same object file). I vaguely remember Dan Steffen — or maybe Jim Ingham, as it was a long time ago — having a hand in the hacking of this, because at the time I used a totally different architecture.

andreas_kupries added on 2013-01-17 07:49:38:
Reopening the bug now ...
AS OSX build box is snow leopard, with options set for leopard as min compat version. The older 8.4 build is done on tiger.

dkf added on 2013-01-17 07:38:57:
Just checked the placement of variables on the stack; getting the wrong type for Tcl_StatBuf would indeed cause the crash, as the OS would write as if it was a 'struct stat64' whereas Tcl was compiled as if it was a 'struct stat'. (The other variable just happens to be placed a little bit after on the stack, within stomping range.)

While this is annoying, it's at least clear what the problem *is* now!

dkf added on 2013-01-17 07:30:34:
As far as I can tell, the problem is that there's a mismatch in between the use of 'struct stat' somehow in the code and the fact that we're using stat64() and lstat64() on OSX Leopard. This is in turn overwriting part of the stack it shouldn't (again: hypothesis) which is causing a pointer to a legit Tcl_Obj (in the encoding system initialization) to get turned into a NULL; for obvious reasons, applying a Tcl_GetString() to NULL (this is inside the path concatenation code as it happens) causes a crash.

I emphasize that this is just supposition based on examining the backtrace at the point where the crash happens, together with the observation that I've only just recently started getting compile time warnings about the wrong type being used with stat64 and lstat64. The rest is a guess, but could so easily happen! (If this is true, it would explain why other developers aren't seeing the problem: they're not getting the type mismatch in the first place.)

dkf added on 2013-01-16 23:02:16:
Appears to break OSX Leopard. Still investigating…

nijtmans added on 2013-01-16 00:42:31:
Fixed in core-8-5-branch and trunk now.

Thanks! Andreas!

andreas_kupries added on 2013-01-16 00:16:51:
Ok. tclsh was built on all my platforms now. The only thing which tripped me is that the branch is apparently off Tcl 8.5, and I tried to build it for AT 8.6, so the overall build failed, with Tk complaining about wanting 8.6 and having only 8.5.

But the tclsh itself was build fine.
Confirmed its existence on all machines.
Works too, trying 'info patchlevel'.

andreas_kupries added on 2013-01-15 03:24:29:
I will set things up for another trial this night.

nijtmans added on 2013-01-15 02:59:28:
Well, that's embarassing   ;-(   Should be fixed now. In tclUnixPort.h
there were still some usages of EXTERN and CONST, which depended on tcl.h

Andreas, can I bother you again to build the "bug-3598300" branch?  Thanks!

andreas_kupries added on 2013-01-15 00:19:07:
I saw the notifications for this thread coming through during the weekend.

As an 8.6 build was scheduled for Sunday night I moved that to the branch in question, i.e.
* bug-3598300

This branch failed to build _everywhere_. I will give examples below for Linux, and AIX.

Linux:

EXEC ERROR: In file included from /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../generic/tclPort.h:23,
EXEC ERROR:                  from /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../generic/tclInt.h:36,
EXEC ERROR:                  from /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../generic/regcustom.h:33,
EXEC ERROR:                  from /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../generic/regguts.h:36,
EXEC ERROR:                  from /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../generic/regcomp.c:33:
EXEC ERROR: /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../unix/tclUnixPort.h:630: error: syntax error before âstructâ
EXEC ERROR: /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../unix/tclUnixPort.h:630: error: syntax error before âtime_tâ
EXEC ERROR: /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../unix/tclUnixPort.h:631: error: syntax error before âstructâ
EXEC ERROR: /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../unix/tclUnixPort.h:631: error: syntax error before âtime_tâ
EXEC ERROR: /home/andreask/dbn/lba/GlobalBuildArena/builds/linux-ix86/tcl/unix/../unix/tclUnixPort.h:649: error: syntax error before âintâ
EXEC ERROR: make: *** [regcomp.o] Error 1

AIX fails at the same location, with slightly different messaging.

EXEC ERROR: "/home/andreask/dbn/lba/GlobalBuildArena/builds/aix-rs6000/tcl/unix/../unix/tclUnixPort.h", line 630.39: 1506-277 (S) Syntax error: possible missing ')' or ','?
EXEC ERROR: "/home/andreask/dbn/lba/GlobalBuildArena/builds/aix-rs6000/tcl/unix/../unix/tclUnixPort.h", line 630.1: 1506-485 (S) Parameter declaration list is incompatible with declarator for EXTERN.
EXEC ERROR: "/usr/include/grp.h", line 153.17: 1506-007 (S) "struct group" is undefined.
EXEC ERROR: "/usr/include/grp.h", line 165.17: 1506-007 (S) "struct group" is undefined.
EXEC ERROR: make: *** [regcomp.o] Error 1

So, at this point the branch does not seem to be ready for merging.

nijtmans added on 2013-01-14 17:07:31:
> I thought we'd managed to make it so that the definition wasn't needed in
> tcl.h. Might've been wrong on that though. :-)
I don't think you are wrong, but still it would be nice to test it on
at least one ancient UNIX system,  like IRIX, HP-UX or SunOS-4.

I would prefer to put it in 8.5, so it's simple to bubble-up to
higher releases. The reason is just practical: There is no
hurry for Tcl 8.6.1, so I see no hurry to rush this into 8.6
first and then backport. A few weeks for Andreas to
do some additional testing is fine with me.

dkf added on 2013-01-14 03:36:46:
I thought we'd managed to make it so that the definition wasn't needed in tcl.h. Might've been wrong on that though. :-)

No objection in principle, provided it builds on Windows with MSVC (since it should work just fine on all POSIX systems).

nijtmans added on 2013-01-13 19:31:18:
I'm OK with that, but I still would like to hear Andreas' answer first before doing that.

Therefore, assigning to Andreas for now.

ferrieux added on 2013-01-13 17:07:41:
Jan, Don, anything against merging dcbf22ad99 into 8.6 trunk ?

ferrieux added on 2012-12-28 04:03:25:
rofl0r, sorry for not keeping you up to date on this, but what you see on the trunk is only half of the fix (killing the bad dependencies). The second half (the actual permutation, putting sys headers first) did occur in a separate branch that Jan created: http://core.tcl.tk/tcl/info/dcbf22ad99

I assume it will soon be merged to trunk if proven harmless (meaning: test on just about every alien OS too ;)

roflz added on 2012-12-28 03:57:56:
thanks njitmans for picking this up. however your fix is only a workaround this specific issue, but it still leaves everything else broken.
this should be fixed properly by including all system headers before doing anything that depends on the types and other things defined in those headers, as ferrieux correctly noted.

njitmans: i think it would help if you could answer ferrieux' question:
"Can you please give these details (or just the kind of macros from tcl.h
that are involed) ?"

nijtmans added on 2012-12-28 03:25:53:
Thanks, Don! But I'm fully aware of this header reform, and I
don't think that my latest suggestions violatest this:
<http://core.tcl.tk/tcl/info/dcbf22ad99>
I don't see any danger in this, but please put in your view!

dgp added on 2012-12-28 02:56:53:
Only skimming the comments here.  One thing to keep in mind
with any editing of the #include dependency graph is to preserve
the goals of the 8.5 era "header reform."  See 922727.

ferrieux added on 2012-12-26 00:53:50:
Sorry, but digging a platform-specific header out and sticking it on top of the main generic one doesn't sound like a solid basis... Instead, I'd rather understand in detail why some parts of tcl*Port (like the compat/ ones) depend on being after tcl.h, and either stop this dependency, or move them in a separate file, to be included after tcl.h

Can you please give these details (or just the kind of macros from tcl.h that are involed) ?

nijtmans added on 2012-12-25 19:16:44:
> Jan, any comment on the side-effects you feared, so that you didn't
> generalize ?
Well, that change was for fixing a compiler warming with mingw, not
for cygwin, but I agree with your analysis. It would be good to
include sys/stat.h before tcl.h. The problem with this is
the compat/*.h files, which currently need tcl.h to be
included first. On cygwin/linux/solaris those are not used,
so the move would be safe. But on older systems .....

Another remark: The inclusion order didn't change
between Tcl 8.5 and 8.6 (I can see in Makefile.tcl
that the error occurred in 8.6, but I am assuming
the same 'bug' is in 8.5 as well).

The best solution here might be to move the
sys/stat.h include from tcl*Port.h to tclInt.h.
See:
   <http://core.tcl.tk/tcl/info/02a51cbb69>
That has the least probability to break
on any other platform.

rofl0r, Alex, would that work for you? Any
problems you can see with this change?

ferrieux added on 2012-12-25 08:43:54:
Transferring to Jan Nijtmans, who did a similar move in a specific case (cygwin).
Jan, any comment on the side-effects you feared, so that you didn't generalize ?
(I'm referring to ad7cfb6ee3 where you moved tclWinPort above tcl.h, which looks like a good idea of wider applicability ;)

roflz added on 2012-12-25 08:30:21:
here's the bug: tclPort.h


#include "tcl.h"
#if !defined(_WIN32)
#   include "tclUnixPort.h"
#endif


it should include tclUnixPort.h before tcl.h, problem solved

and for maximum portability put the include block in tclUnixport before any typedefs and macro definitions

roflz added on 2012-12-25 08:25:17:
again i have not tweaked anything.
this is the Makefile that gets created when running CFLAGS=-D_GNU_SOURCE ./configure.
i never used anything else.
it is perfectly legal to define _GNU_SOURCE, the libc is free to do it automatically for example by defining it in features, and so am i.
since stat64 is completely non-standard, the implementation is free to define it as macro so you better make sure it works, by putting the inclusion of sys/stat.h before your typedefs.

roflz added on 2012-12-25 08:19:39:
i mean this part here

#elif defined(HAVE_STRUCT_STAT64)
#   define TclOSstat            stat64
#   define TclOSlstat           lstat64
#else
#   define TclOSstat            stat
#   define TclOSlstat           lstat
#endif

the inclusion of stat.h comes later
so whatever this tries to do, it doesn't see the stat64 macro and thus may fail.
it is probably not related to the error i'm seeing.

however it is very likely that the problem we're facing here has the same cause:
tcl.h gets included before <sys/stat.h>, so the macro doesnt get expanded
and the compiler fails to find the struct stat64.

ferrieux added on 2012-12-25 07:43:51:
>looking at tclUnixPort.h, it appears to me that sys/stat.h should be
>included before any stat-related ifdeffery
>there is code that defines things to be stat64, but since sys/stat.h is
>included later, the stat64 macro does not get expanded.

Red herring. This code is just a pair of prototypes, passing opaque pointers. Even if I agree that they don't benefit from the macro, there is no harm done beyond a possible warning.

If you want to proceed on this, please give the error message that you go when trying to compile with the vanilla Makefile generated by configure (not the one tweaked with whatever habits you have).

ferrieux added on 2012-12-25 07:40:53:
> it is not legal to include bits/stat.h directly,
> this is an internal header! -if you do this,
> that would explain the breakage

Ahem. Do you realize that this is the deed of (your/musl's) sys/stat.h ? Tcl just includes <sys/stat.h> just as everybody else.

roflz added on 2012-12-25 02:41:37:
looking at tclUnixPort.h, it appears to me that sys/stat.h should be included before any stat-related ifdeffery
there is code that defines things to be stat64, but since sys/stat.h is included later, the stat64 macro does not get expanded.

roflz added on 2012-12-25 02:28:31:
musl's sys/stat.h includes an arch specific bits/stat.h that provides the actual definition of struct stat.
http://git.musl-libc.org/cgit/musl/tree/arch/i386/bits/stat.h

however it is not legal to include bits/stat.h directly, this is an internal header!
if you do this, that would explain the breakage.

roflz added on 2012-12-25 02:25:38:
i never used any other Makefile
adding -D_GNU_SOURCE to CFLAGS is something i automatically do always

so it was not added specifically to circumvent some problems here.

this is the exact Makefile that causes the build failure.

ferrieux added on 2012-12-25 01:26:53:
Still, there might be a side effect to -D_GNU_SOURCE. Please give the vanilla Makefile, the one lacking HAVE_SYS_STAT.

ferrieux added on 2012-12-25 01:24:33:
Ok, just seen the attachment, sorry ;)

ferrieux added on 2012-12-25 01:23:46:
So what ? Now your sys/stat.h is included, and stat64 gets mapped to stat, Tcl ends up needing a "struct stat" somewhere. I assume it is in <bits/stat.h> as in most unices, isn't it.

To make this exchange more efficient, please provide the generated Makefile without any tweaks.
Patching tcl.h or forcing CFLAGS sound  brittle ; better fix the root problem in the configure logic.

roflz added on 2012-12-25 00:36:30:
here's the relevant header part in musl libc:
http://git.musl-libc.org/cgit/musl/tree/include/sys/stat.h#n98

i added -D_GNU_SOURCE to the CFLAGS when launching configure, so this part got visible.
tcl.h takes the typedef struct stat64 Tcl_StatBuf route. however without seeing the macro in sys/stat.h this cannot succeed, because stat64 is only a macro.

roflz added on 2012-12-25 00:21:52:

File Added - 458329: Makefile.tcl

ferrieux added on 2012-12-24 23:45:57:
Please attach the Makefile generated by Tcl's configure in this environment.

roflz added on 2012-12-24 20:02:45:
this was experienced using musl libc 0.9.8 on i386
since musl always uses LARGEFILE64, it #defines stat64 stat

https://github.com/rofl0r/sabotage/commit/70807c9a16ef053b5bb432c69bccb29da27b1b10

ferrieux added on 2012-12-24 16:44:58:
The unix-dependent includes come through tclUnixPort.h, included (through tclPort.h) by tclInt.h.
So it is not as plain as a failure to include an obviously needed system header, but rather some ifdeffery that fails in this specific case (that I cannot reproduce). Any specific info re the OS and/or configure/-D flags ?

Attachments: