Tcl Source Code

View Ticket
Login
Ticket UUID: 1815573
Title: Stack space check fails in Linux-x86 build
Type: Bug Version: obsolete: 8.5.1
Submitter: nobody Created on: 2007-10-18 06:54:48
Subsystem: 53. Configuration and Build Tools Assigned To: msofer
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2010-01-29 20:24:22
Resolution: Fixed Closed By: msofer
    Closed on: 2008-08-05 03:35:17
Description:
i486, 2.6.18, Linux, glibc 2.6.1

Problematic tcl - 8.5b1. Last known working version 8.5a6.

Installing message catalogs fails with:


Installing message catalogs
application-specific initialization failed: too many nested evaluations (infinite loop?)
too many nested evaluations (infinite loop?)
    while executing
"proc copyDir { d1 d2 } {

    puts [format {%*sCreating %s} [expr { 4 * [info level] }] {} \
              [file tail $d2]]

    file delete -force -- $d2
  ..."
    (file "/home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tools/installData.tcl" line 23)
User Comments: dkf added on 2010-01-29 20:24:22:

allow_comments - 1

dougedey added on 2010-01-29 19:03:06:
Sorry, wrong bug :(

dougedey added on 2010-01-29 18:52:21:
Hi, I have AIX 6.1 available to me and I have hit this same issue with the standard build options. I'm willing to assist with debugging what is going wrong.

$ ./tclsh    
application-specific initialization failed: out of stack space (infinite loop?)

msofer added on 2008-08-05 10:35:17:
Logged In: YES 
user_id=148712
Originator: NO

Specific issue discussed at #2017264

jenglish added on 2008-08-05 10:16:02:
Logged In: YES 
user_id=68433
Originator: NO

@miguel --

| Stack check abandoned in head [...]

This doesn't appear to be completely purged -- there's still a lot of goo in unix/tclUnixThrd.c (r1.59 2008/07/24).  How much of this can go away?  (In particular: can we zorch TclpThreadGetStackSize altogether?)

sf-robot added on 2008-08-05 09:20:03:
Logged In: YES 
user_id=1312539
Originator: NO

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

msofer added on 2008-07-21 11:52:14:
Logged In: YES 
user_id=148712
Originator: NO

Stack check abandoned in head (HEAD, 8.6a2 when released) due to
(a) HEAD is now (almost) stackless thx to NRE: it is much more difficult to hit the stack limit
(b) the previous approach is non-portable, hard to maintain and generally a mess

msofer added on 2008-01-14 21:38:06:
Logged In: YES 
user_id=148712
Originator: NO

Even though the current patch works, it may be throwing too much out:

(a) on *my* linux, pthread_getattr_np/pthread_attr_getstacksize seem to be working fine also on the initial thread. But it is currently disabled.

(b) the guile project seems to have found a better(?) workaround - not calling these at all, but rather use pthread_get_stacksize_np on linux. Note also the use of pthread_get_stackaddr_np; maybe usable to get a better estimate? It may pay off to study/adapt/adopt that? http://www.mail-archive.com/[email protected]/msg01646.html

(c) this is one spot where a better platform-dep #ifdeffery may be warranted. In any case, it is already present: windows and mac have their own stuff (optimal??), currently glibc too (assuming things fail on initial thread).

msofer added on 2008-01-12 20:06:34:
Logged In: YES 
user_id=148712
Originator: NO

Closing, this is fixed in 8.5.0 afaik.

msofer added on 2007-12-20 23:06:00:
Logged In: YES 
user_id=148712
Originator: NO

Re last comment: that is a fluke caused by 'make test' - it seems to set the soft limit to the hard value. Running ./tclsh fixes this.

msofer added on 2007-11-27 02:12:38:
Logged In: YES 
user_id=148712
Originator: NO

Patch committed, lowering prio. There is still something fishy going on: getrusage is apparently reporting hard limits in both the rlim_cur and rlim_max fields (contrary to documentation)

msofer added on 2007-11-26 20:49:06:

File Added - 256100: stack.patch

Logged In: YES 
user_id=148712
Originator: NO

Attaching a tentative patch. Please review.
File Added: stack.patch

msofer added on 2007-11-26 03:45:37:
Logged In: YES 
user_id=148712
Originator: NO

Digging in with Teo (Sergei Golovan) at the chat, the finger seems to point to the pthread library: before the call to TclpPthreadGetAttrs (which is just pthread_attr_getstacksize) shows that the thread default stack size (in his config) was 2097152 - a reasonable value. But after that call the value is reported as -191795200 (after being cast to int), a not-reasonable value.

msofer added on 2007-11-25 23:19:43:
Logged In: YES 
user_id=148712
Originator: NO

Not the same problem - although related.

This bug is about "when the stack size cannot be determined, Tcl assumes there is no stack". This has been fixed, we now assume the stack is infinite instead.

The bug reported at debian is: "the stack size can be determined (wrongly?), and it is deemed insufficient". Note that instructing the OS to use larger stacks fixes the issue.

The problem is one of:
  * we are wrongly determining the stack size (change in libraries?). See the comment at unix/tclUnixInit.c line 55
  * our "stack reserve" is too large (8 pages)
  * we REALLY are consuming huge piles of stack in that system

jenglish added on 2007-11-25 23:09:57:
Logged In: YES 
user_id=68433
Originator: NO

Problem is apparently still present in 8.5b3.

See: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=452679

msofer added on 2007-11-13 07:48:00:
Logged In: YES 
user_id=148712
Originator: NO

In current HEAD this will fall back to "no stack checking at all", instead of refusing to run.

arekm added on 2007-10-25 15:56:58:
Logged In: YES 
user_id=139606
Originator: NO

Only pthread_attr_get_np fails so workaround by undefining it works well here. And pthread_attr_get_np is not used anywhere in tcl code beside stack checking.

The solution IMO would be to fallback to old way if pthread_attr_get_np fails but *runtime*.

dkf added on 2007-10-25 15:36:42:
Logged In: YES 
user_id=79902
Originator: NO

Holy Wombat Manicures, Batman! No /proc mounted? That's going to break a lot of code, though neither Tcl nor Tk mention it specifically, so it's really a fault of the C library and pthread library in that situation, and hence technically Not Our Problem.

You could work around this by hacking the Makefile after the configure step so that it doesn't think that it has either pthread_attr_setstacksize() or pthread_getattr_np() which will force it back into being slightly less safe but more reliable. Do this by changing the defines for HAVE_PTHREAD_ATTR_SETSTACKSIZE and HAVE_PTHREAD_GET_STACKSIZE_NP so that they're undefined (look in the AC_FLAGS line, change the relevant -Dwhatever=1 bits to -Uwhatever). But be aware that other things may break without warning too; we can't warrant the correct functioning of Tcl when the basic underlying libraries are that far out of their comfort zone.

arekm added on 2007-10-25 03:37:04:
Logged In: YES 
user_id=139606
Originator: NO

To make things clear - pthread_getattr_np() fails and strace reveals that it's trying to use /proc/self/maps.

arekm added on 2007-10-25 03:35:26:
Logged In: YES 
user_id=139606
Originator: NO

[builder2@kratista unix]$ ./tclsh
skipping stack check with failure
application-specific initialization failed: too many nested evaluations (infinite loop?)
% set a 1
skipping stack check with failure
skipping stack check with failure
too many nested evaluations (infinite loop?)
%

dgp added on 2007-10-25 03:21:45:
Logged In: YES 
user_id=80530
Originator: NO


How about a simple command that should work?

% set a 1

What does that do?

arekm added on 2007-10-25 03:15:41:
Logged In: YES 
user_id=139606
Originator: NO

Got a suspect. 
26826 open("/proc/self/maps", O_RDONLY) = -1 ENOENT (No such file or directory)
and /proc is not mounted here and I guess glibc internally uses this file for pthread* stuff.

arekm added on 2007-10-25 03:09:09:
Logged In: YES 
user_id=139606
Originator: NO

#define TCL_DEBUG_STACK_CHECK 1
and
[builder2@kratista unix]$ ./tclsh
skipping stack check with failure
application-specific initialization failed: too many nested evaluations (infinite loop?)
%  

It enters this codepath which returns -1 and all colapses.

       if (TclpPthreadGetAttrs(pthread_self(), &threadAttr) != 0) {
        pthread_attr_destroy(&threadAttr);
        return -1;
    }

TclpPthreadGetAttrs is pthread_getattr_np here

arekm added on 2007-10-25 02:59:56:
Logged In: YES 
user_id=139606
Originator: NO

[builder2@kratista unix]$ export LD_LIBRARY_PATH=.
[builder2@kratista unix]$ ./tclsh
application-specific initialization failed: too many nested evaluations (infinite loop?)
% blahblah
too many nested evaluations (infinite loop?)
%
too many nested evaluations (infinite loop?)
%    

Compiler is the same on all 5 machines (x86_64, i686, i486, athlon, ppc) and the problem is visible
only one i486. gcc version 4.2.2 20071010 (release)

dgp added on 2007-10-22 03:21:02:
Logged In: YES 
user_id=80530
Originator: NO


something about these symptoms smells
like the consequences of a broken
compiler (or broken optimization within
a compiler).

Can you determine precisely what executable
is running that throws the error message?
Can you run that executable interactively
and determine if any command at all can
be evaluated in it?

arekm added on 2007-10-21 04:43:44:
Logged In: YES 
user_id=139606
Originator: NO

[builder2@kratista unix]$ make test
LD_LIBRARY_PATH=`pwd`:${LD_LIBRARY_PATH}; export LD_LIBRARY_PATH; \
        TCL_LIBRARY="/home/users/builder2/rpm/BUILD/tcl8.5b1/library"; export TCL_LIBRARY; \
        ./tcltest /home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl
application-specific initialization failed: too many nested evaluations (infinite loop?)
too many nested evaluations (infinite loop?)
    while executing
"package require Tcl 8.5"
    (file "/home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl" line 15)
make: *** [test] Error 1
[builder2@kratista unix]$ make test
LD_LIBRARY_PATH=`pwd`:${LD_LIBRARY_PATH}; export LD_LIBRARY_PATH; \
        TCL_LIBRARY="/home/users/builder2/rpm/BUILD/tcl8.5b1/library"; export TCL_LIBRARY; \
        ./tcltest /home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl
application-specific initialization failed: too many nested evaluations (infinite loop?)
too many nested evaluations (infinite loop?)
    while executing
"package require Tcl 8.5"
    (file "/home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl" line 15)
make: *** [test] Error 1


I've done testing on few architectures (all using the same versions of software; built for different architectures of course):
tcl-8.5-0.b1.1.src.rpm (tcl.spec -R HEAD ) [th-x86_64:OK th-athlon:OK th-i486:FAIL th-i686:OK th-ppc:OK]
so only i486 is problematic here due to some reason (I suspect that kernel has some influence on this - 2.6.18-4-xen-vserver-amd64).

tcl was built with
        --enable-langinfo \
        --enable-shared \
        --enable-threads \
        --enable-64bit \
        --enable-gcc \
        --without-tzdata

build log http://buildlogs.pld-linux.org/index.php?dist=th&arch=i486&ok=0&id=96c540c1c34fc4589e776027b81db8d0

kennykb added on 2007-10-21 03:54:17:
Logged In: YES 
user_id=99768
Originator: NO

Does 'make test' reveal anything informative?  This does indeed sound as if the installer is the innocent victim of a faulty stack check.

Is this a threaded build?

dgp added on 2007-10-18 22:57:52:
Logged In: YES 
user_id=80530
Originator: NO


Something wrong with the
tcl/tools/installData.tcl script?

arekm added on 2007-10-18 14:07:36:
Logged In: YES 
user_id=139606
Originator: NO

On the other hand could be just properly working stack checking and bug somewhere else :-)

arekm added on 2007-10-18 14:01:29:
Logged In: YES 
user_id=139606
Originator: NO

#define TCL_NO_STACK_CHECK 1
makes the problem go away.

Something broken again in stack checking? See old issue #1618411

Attachments: