Ticket UUID: | 1815573 | |||
Title: | Stack space check fails in Linux-x86 build | |||
Type: | Bug | Version: | obsolete: 8.5.1 | |
Submitter: | nobody | Created on: | 2007-10-18 06:54:48 | |
Subsystem: | 53. Configuration and Build Tools | Assigned To: | msofer | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2010-01-29 20:24:22 | |
Resolution: | Fixed | Closed By: | msofer | |
Closed on: | 2008-08-05 03:35:17 | |||
Description: |
i486, 2.6.18, Linux, glibc 2.6.1 Problematic tcl - 8.5b1. Last known working version 8.5a6. Installing message catalogs fails with: Installing message catalogs application-specific initialization failed: too many nested evaluations (infinite loop?) too many nested evaluations (infinite loop?) while executing "proc copyDir { d1 d2 } { puts [format {%*sCreating %s} [expr { 4 * [info level] }] {} \ [file tail $d2]] file delete -force -- $d2 ..." (file "/home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tools/installData.tcl" line 23) | |||
User Comments: |
dkf added on 2010-01-29 20:24:22:
allow_comments - 1 dougedey added on 2010-01-29 19:03:06: Sorry, wrong bug :( dougedey added on 2010-01-29 18:52:21: Hi, I have AIX 6.1 available to me and I have hit this same issue with the standard build options. I'm willing to assist with debugging what is going wrong. $ ./tclsh application-specific initialization failed: out of stack space (infinite loop?) msofer added on 2008-08-05 10:35:17: Logged In: YES user_id=148712 Originator: NO Specific issue discussed at #2017264 jenglish added on 2008-08-05 10:16:02: Logged In: YES user_id=68433 Originator: NO @miguel -- | Stack check abandoned in head [...] This doesn't appear to be completely purged -- there's still a lot of goo in unix/tclUnixThrd.c (r1.59 2008/07/24). How much of this can go away? (In particular: can we zorch TclpThreadGetStackSize altogether?) sf-robot added on 2008-08-05 09:20:03: Logged In: YES user_id=1312539 Originator: NO This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). msofer added on 2008-07-21 11:52:14: Logged In: YES user_id=148712 Originator: NO Stack check abandoned in head (HEAD, 8.6a2 when released) due to (a) HEAD is now (almost) stackless thx to NRE: it is much more difficult to hit the stack limit (b) the previous approach is non-portable, hard to maintain and generally a mess msofer added on 2008-01-14 21:38:06: Logged In: YES user_id=148712 Originator: NO Even though the current patch works, it may be throwing too much out: (a) on *my* linux, pthread_getattr_np/pthread_attr_getstacksize seem to be working fine also on the initial thread. But it is currently disabled. (b) the guile project seems to have found a better(?) workaround - not calling these at all, but rather use pthread_get_stacksize_np on linux. Note also the use of pthread_get_stackaddr_np; maybe usable to get a better estimate? It may pay off to study/adapt/adopt that? http://www.mail-archive.com/[email protected]/msg01646.html (c) this is one spot where a better platform-dep #ifdeffery may be warranted. In any case, it is already present: windows and mac have their own stuff (optimal??), currently glibc too (assuming things fail on initial thread). msofer added on 2008-01-12 20:06:34: Logged In: YES user_id=148712 Originator: NO Closing, this is fixed in 8.5.0 afaik. msofer added on 2007-12-20 23:06:00: Logged In: YES user_id=148712 Originator: NO Re last comment: that is a fluke caused by 'make test' - it seems to set the soft limit to the hard value. Running ./tclsh fixes this. msofer added on 2007-11-27 02:12:38: Logged In: YES user_id=148712 Originator: NO Patch committed, lowering prio. There is still something fishy going on: getrusage is apparently reporting hard limits in both the rlim_cur and rlim_max fields (contrary to documentation) msofer added on 2007-11-26 20:49:06: File Added - 256100: stack.patch Logged In: YES user_id=148712 Originator: NO Attaching a tentative patch. Please review. File Added: stack.patch msofer added on 2007-11-26 03:45:37: Logged In: YES user_id=148712 Originator: NO Digging in with Teo (Sergei Golovan) at the chat, the finger seems to point to the pthread library: before the call to TclpPthreadGetAttrs (which is just pthread_attr_getstacksize) shows that the thread default stack size (in his config) was 2097152 - a reasonable value. But after that call the value is reported as -191795200 (after being cast to int), a not-reasonable value. msofer added on 2007-11-25 23:19:43: Logged In: YES user_id=148712 Originator: NO Not the same problem - although related. This bug is about "when the stack size cannot be determined, Tcl assumes there is no stack". This has been fixed, we now assume the stack is infinite instead. The bug reported at debian is: "the stack size can be determined (wrongly?), and it is deemed insufficient". Note that instructing the OS to use larger stacks fixes the issue. The problem is one of: * we are wrongly determining the stack size (change in libraries?). See the comment at unix/tclUnixInit.c line 55 * our "stack reserve" is too large (8 pages) * we REALLY are consuming huge piles of stack in that system jenglish added on 2007-11-25 23:09:57: Logged In: YES user_id=68433 Originator: NO Problem is apparently still present in 8.5b3. See: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=452679 msofer added on 2007-11-13 07:48:00: Logged In: YES user_id=148712 Originator: NO In current HEAD this will fall back to "no stack checking at all", instead of refusing to run. arekm added on 2007-10-25 15:56:58: Logged In: YES user_id=139606 Originator: NO Only pthread_attr_get_np fails so workaround by undefining it works well here. And pthread_attr_get_np is not used anywhere in tcl code beside stack checking. The solution IMO would be to fallback to old way if pthread_attr_get_np fails but *runtime*. dkf added on 2007-10-25 15:36:42: Logged In: YES user_id=79902 Originator: NO Holy Wombat Manicures, Batman! No /proc mounted? That's going to break a lot of code, though neither Tcl nor Tk mention it specifically, so it's really a fault of the C library and pthread library in that situation, and hence technically Not Our Problem. You could work around this by hacking the Makefile after the configure step so that it doesn't think that it has either pthread_attr_setstacksize() or pthread_getattr_np() which will force it back into being slightly less safe but more reliable. Do this by changing the defines for HAVE_PTHREAD_ATTR_SETSTACKSIZE and HAVE_PTHREAD_GET_STACKSIZE_NP so that they're undefined (look in the AC_FLAGS line, change the relevant -Dwhatever=1 bits to -Uwhatever). But be aware that other things may break without warning too; we can't warrant the correct functioning of Tcl when the basic underlying libraries are that far out of their comfort zone. arekm added on 2007-10-25 03:37:04: Logged In: YES user_id=139606 Originator: NO To make things clear - pthread_getattr_np() fails and strace reveals that it's trying to use /proc/self/maps. arekm added on 2007-10-25 03:35:26: Logged In: YES user_id=139606 Originator: NO [builder2@kratista unix]$ ./tclsh skipping stack check with failure application-specific initialization failed: too many nested evaluations (infinite loop?) % set a 1 skipping stack check with failure skipping stack check with failure too many nested evaluations (infinite loop?) % dgp added on 2007-10-25 03:21:45: Logged In: YES user_id=80530 Originator: NO How about a simple command that should work? % set a 1 What does that do? arekm added on 2007-10-25 03:15:41: Logged In: YES user_id=139606 Originator: NO Got a suspect. 26826 open("/proc/self/maps", O_RDONLY) = -1 ENOENT (No such file or directory) and /proc is not mounted here and I guess glibc internally uses this file for pthread* stuff. arekm added on 2007-10-25 03:09:09: Logged In: YES user_id=139606 Originator: NO #define TCL_DEBUG_STACK_CHECK 1 and [builder2@kratista unix]$ ./tclsh skipping stack check with failure application-specific initialization failed: too many nested evaluations (infinite loop?) % It enters this codepath which returns -1 and all colapses. if (TclpPthreadGetAttrs(pthread_self(), &threadAttr) != 0) { pthread_attr_destroy(&threadAttr); return -1; } TclpPthreadGetAttrs is pthread_getattr_np here arekm added on 2007-10-25 02:59:56: Logged In: YES user_id=139606 Originator: NO [builder2@kratista unix]$ export LD_LIBRARY_PATH=. [builder2@kratista unix]$ ./tclsh application-specific initialization failed: too many nested evaluations (infinite loop?) % blahblah too many nested evaluations (infinite loop?) % too many nested evaluations (infinite loop?) % Compiler is the same on all 5 machines (x86_64, i686, i486, athlon, ppc) and the problem is visible only one i486. gcc version 4.2.2 20071010 (release) dgp added on 2007-10-22 03:21:02: Logged In: YES user_id=80530 Originator: NO something about these symptoms smells like the consequences of a broken compiler (or broken optimization within a compiler). Can you determine precisely what executable is running that throws the error message? Can you run that executable interactively and determine if any command at all can be evaluated in it? arekm added on 2007-10-21 04:43:44: Logged In: YES user_id=139606 Originator: NO [builder2@kratista unix]$ make test LD_LIBRARY_PATH=`pwd`:${LD_LIBRARY_PATH}; export LD_LIBRARY_PATH; \ TCL_LIBRARY="/home/users/builder2/rpm/BUILD/tcl8.5b1/library"; export TCL_LIBRARY; \ ./tcltest /home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl application-specific initialization failed: too many nested evaluations (infinite loop?) too many nested evaluations (infinite loop?) while executing "package require Tcl 8.5" (file "/home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl" line 15) make: *** [test] Error 1 [builder2@kratista unix]$ make test LD_LIBRARY_PATH=`pwd`:${LD_LIBRARY_PATH}; export LD_LIBRARY_PATH; \ TCL_LIBRARY="/home/users/builder2/rpm/BUILD/tcl8.5b1/library"; export TCL_LIBRARY; \ ./tcltest /home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl application-specific initialization failed: too many nested evaluations (infinite loop?) too many nested evaluations (infinite loop?) while executing "package require Tcl 8.5" (file "/home/users/builder2/rpm/BUILD/tcl8.5b1/unix/../tests/all.tcl" line 15) make: *** [test] Error 1 I've done testing on few architectures (all using the same versions of software; built for different architectures of course): tcl-8.5-0.b1.1.src.rpm (tcl.spec -R HEAD ) [th-x86_64:OK th-athlon:OK th-i486:FAIL th-i686:OK th-ppc:OK] so only i486 is problematic here due to some reason (I suspect that kernel has some influence on this - 2.6.18-4-xen-vserver-amd64). tcl was built with --enable-langinfo \ --enable-shared \ --enable-threads \ --enable-64bit \ --enable-gcc \ --without-tzdata build log http://buildlogs.pld-linux.org/index.php?dist=th&arch=i486&ok=0&id=96c540c1c34fc4589e776027b81db8d0 kennykb added on 2007-10-21 03:54:17: Logged In: YES user_id=99768 Originator: NO Does 'make test' reveal anything informative? This does indeed sound as if the installer is the innocent victim of a faulty stack check. Is this a threaded build? dgp added on 2007-10-18 22:57:52: Logged In: YES user_id=80530 Originator: NO Something wrong with the tcl/tools/installData.tcl script? arekm added on 2007-10-18 14:07:36: Logged In: YES user_id=139606 Originator: NO On the other hand could be just properly working stack checking and bug somewhere else :-) arekm added on 2007-10-18 14:01:29: Logged In: YES user_id=139606 Originator: NO #define TCL_NO_STACK_CHECK 1 makes the problem go away. Something broken again in stack checking? See old issue #1618411 |
Attachments:
- stack.patch [download] added by msofer on 2007-11-26 20:49:03. [details]