Tcl Source Code

View Ticket
Login
Ticket UUID: 2891556
Title: segfault in test encoding-24.2 (8.5)
Type: Bug Version: obsolete: 8.5.7
Submitter: mistachkin Created on: 2009-11-03 23:15:59
Subsystem: 10. Objects Assigned To: dgp
Priority: 8 Severity:
Status: Closed Last Modified: 2010-08-02 23:57:28
Resolution: Fixed Closed By: dgp
    Closed on: 2010-08-02 16:57:28
Description:
This crash is with Tcl 8.5 (core-8-5-branch, threaded debug build),

-DTCL_MEM_DEBUG -DTCL_COMPILE_DEBUG
-DTCL_COMPILE_STATS -DTCL_THREADS=1 -DUSE_THREAD_ALLOC=1

==== encoding-24.2 EscapeFreeProc on open channels FAILED
==== Contents of test case:

    viewable [exec [interpreter] $file]

---- Test generated error; Return code was: 1
---- Return code should have been one of: 0 2
---- errorInfo: ab$B8C$(DD%(Bg
child killed: segmentation violation
    while executing
"exec [interpreter] $file"
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $script"
---- errorCode: CHILDKILLED 1360 SIGSEGV {segmentation violation}
==== encoding-24.2 FAILED

stack trace:

>tcl85tg.dll!FreeEncoding(Tcl_Encoding_ * encoding=0x00aae0d0)  Line 848 + 0x3 bytesC
 tcl85tg.dll!EscapeFreeProc(void * clientData=0x00ab0360)  Line 3398 + 0xc bytesC
 tcl85tg.dll!FreeEncoding(Tcl_Encoding_ * encoding=0x00ab0790)  Line 851 + 0x11 bytesC
 tcl85tg.dll!TclFinalizeEncodingSubsystem()  Line 670 + 0xc bytesC
 tcl85tg.dll!Tcl_Finalize()  Line 1113C
 tcl85tg.dll!Tcl_Exit(int status=0x00000000)  Line 908C
 tcl85tg.dll!Tcl_ExitObjCmd(void * dummy=0x00000000, Tcl_Interp * interp=0x00a2c4b0, int objc=0x00000001, Tcl_Obj * const * objv=0x00a30250)  Line 727 + 0x9 bytesC
 tcl85tg.dll!TclEvalObjvInternal(Tcl_Interp * interp=0x00a2c4b0, int objc=0x00000001, Tcl_Obj * const * objv=0x00a30250, const char * command=0x00a980f8, int length=0x00000005, int flags=0x00000000)  Line 3689 + 0x1d bytesC
 tcl85tg.dll!TclEvalEx(Tcl_Interp * interp=0x00a2c4b0, const char * script=0x00a980b8, int numBytes=0x0000004a, int flags=0x00000000, int line=0x00000004, int * clNextOuter=0x00000000, const char * outerScript=0x00a980b8)  Line 4387 + 0x21 bytesC
 tcl85tg.dll!Tcl_EvalEx(Tcl_Interp * interp=0x00a2c4b0, const char * script=0x00a980b8, int numBytes=0x0000004a, int flags=0x00000000)  Line 4043 + 0x1d bytesC
 tcl85tg.dll!Tcl_FSEvalFileEx(Tcl_Interp * interp=0x00a2c4b0, Tcl_Obj * pathPtr=0x00a8b4d8, const char * encodingName=0x00000000)  Line 1814 + 0x13 bytesC
 tcl85tg.dll!Tcl_Main(int argc=0xffffffff, char * * argv=0x00a27f98, int (Tcl_Interp *)* appInitProc=0x004165d0)  Line 441 + 0x11 bytesC
 tcltest.exe!main(int argc=0x00000002, char * * argv=0x00a27f90)  Line 102 + 0x15 bytesC
 tcltest.exe!__tmainCRTStartup()  Line 586 + 0x19 bytesC
 tcltest.exe!mainCRTStartup()  Line 403C
 kernel32.dll!7c817077() 
 [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
User Comments: dgp added on 2010-08-02 23:57:28:

allow_comments - 1


Finally got back to this.  It's impossible to *love* this
solution, since it represents one more hack on the stack,
but it solves the immediate bug reported, and should do
fine until the real solution of complete re-architecting can
happen in the mythical future.

Note that the limited nature of the fix is mitigated a bit
since any other FreeProc that would have the same
problem is going to run into
Tcl Bug 3038316 anyway.

ferrieux added on 2010-03-01 20:44:26:
Well that's not exactly my interpretation of Don's formulation below :}
Let him comment...

dkf added on 2010-03-01 20:14:44:
OK, then I'll drop the prio as it is not now a reason to hold back from releasing.

ferrieux added on 2010-03-01 20:01:30:
Fixed in all three HEAD branches (after the last releases) but left open as a reminder for Don to review before 8.6b2/8.5.9.

dkf added on 2010-02-27 15:12:37:
So what is the status of this bug? (I'm getting clean test suite runs, but I use OSX...)

ferrieux added on 2009-11-17 06:25:20:
Yes, and same for 8.[46] since 8.6 has extra steps to touch Finalize.

andreas_kupries added on 2009-11-17 06:22:42:
Yes. That does crash. So. To repro, take CVS 8.5 branch head, and disable the 'if (encodingsInitialized)' test, but not the loop itself.

andreas_kupries added on 2009-11-17 06:17:10:
Oh, I should have disabled the if condition, but not the loop it was put around. Ok.
Lets try that.

ferrieux added on 2009-11-17 06:15:33:
yes, but you should not have disabled the block, just the test. The loop over subtables is needed ! (see before patch)

andreas_kupries added on 2009-11-17 06:13:58:
"disabled the block" in the code.
Left the encoding-24.2 test from the branch head unchanged.
I.e. it contained the 'encoding system cp1252' command.

ferrieux added on 2009-11-17 06:10:29:
Sorry to ask.. do you mean "disabled the block" or "disabled the test" ?

andreas_kupries added on 2009-11-17 05:57:17:
FWIW, I cannot reproduce it either (linux, 32bit). Things I did:

(1) Updated to CVS 8.5 branch head (Has the patch for encoding in).
(2) Build, then run testsuite, -verbose bpsten
(3) Disabled the "if (encodingsInitialized) ..." block in EscapeFreeProc. (#if 0/#endif)
(4) Rebuild, retested.
(5) No crash.

ferrieux added on 2009-11-17 04:44:47:
To clarify: to get the crash revelator, you must include:
    - the Tcl_Panic snippet inside FreeEncoding
    - the [encoding system cp1252] in encoding.test
and of course exlude the remainder of the patch (EscapeFreeProc).

The reason you need the Tcl_Panic is because double-frees are only occasionally followed by crashed under normal conditions.

dgp added on 2009-11-17 04:25:23:
FWIW, not on my box it doesn't.

ferrieux added on 2009-11-17 03:49:06:
Sorry, I'd really love to bridge that 10% left, but without a hint of what's missing it is hard. Any hint appreciated.
Ah, and in case it was unclear, the patch passes the test suite in all three branches, and the "crash revelator" panics all three unpatched trees.

dgp added on 2009-11-17 01:10:53:

allow_comments - 0

dgp added on 2009-11-17 01:10:52:
Thanks for the detailed analysis and strong effort
on this one.  Looks to me like this is at least 90%
of the way to the right fix for a real problem.

Unfortunately I need a 100% fix, and I need it
today.  You may have that already, but it's not
in a form where a slow dumb guy like me can
see it without question without taking the whole
day to see it.

It may be the wrong call, but I've been burned
too many times before with rushed fixes in
sensitive areas like finalization.  This will slip 8.5.8,
but will stay committed and at hi-prio to get any
reservations cleared up in time for 8.6b2 (and 8.5.9).

ferrieux added on 2009-11-17 00:51:17:
Patch forward-proted to 8.6, including the needed extra tcltest command.
Committed to HEAD, even though with too much enthusiasm ;-)

ferrieux added on 2009-11-17 00:48:55:
No. It is true that 506653 changes the order of opening of sub-encodings, but that should marginally affect the order of finalization which is rather linked to the hash order.

dgp added on 2009-11-17 00:16:19:
Any relation between this and 506653 ?

ferrieux added on 2009-11-17 00:04:41:
Applied cleanly to 8.4, verified the pre-fix crash, committed to core-8-4-branch.

ferrieux added on 2009-11-16 23:39:55:
Hmm, I will look, but note that in 8.6 Tcl_Finalize is no longer called by [exit], so the test case will need explicit tcltest support (maybe it exists, I have very little familiraty with tcltest).

dkf added on 2009-11-16 21:27:41:
BTW, make sure that the same problem isn't also in 8.4 and the HEAD please. The code should be *very* similar...

dkf added on 2009-11-16 21:24:35:

allow_comments - 1

dkf added on 2009-11-16 21:24:34:
Shift encodings? I see you have indeed ventured into the Heart Of Evil. I hate those things. (I suspect they're done wrongly in some subtle way, and are why the sub-area is so hard to maintain.)

Thanks for hunting this down.

ferrieux added on 2009-11-16 19:17:18:
Fix committed to 8.5 branch.

ferrieux added on 2009-11-16 18:02:24:
Attaching the (very simple) fix, with detailed comment in the source, and crash revelator in the test case.
This means that applying only the second part to the source tree gives a reliable Tcl_Panic in encoding-24.2, regardless of the platform nor -D flags.

ferrieux added on 2009-11-16 18:00:27:

File Added - 351080: encoding.patch

ferrieux added on 2009-11-16 17:34:38:
Got it now. Mechanism:
   - when an escape encoding is loaded, as a side-effect it that its sub-encodings get in the encoding table with refcount 1.
   - at finalization table, the whole contents of the table are purged.
   - when an escape encoding is purged, the sub-encodings get purged recursively (EscapeFreeProc).
   - *if* the final (refcount==1) purge of a sub-encoding occurs *first* from within this sub scan, it gets properly removed from the toplevel table
  - *if* the (random) table hash order makes it so that the purge occurs *first* at toplevel, then the encoding gets nuked while being still referenced as a sub.; its pointer soon gets a stab in the sub scan --> CRASH.

Note that incr'ing refcount on sub creation is not an option, because we'd then leak a ref at toplevel, hence in a non-finalization case (freeing an isolated escape encoding), we'd leak the subs.

This all is a case where something like weak references would help, but we have only strong ones here.

A simpler fix should be to detect the finalization case and refrain from the sub-scan FreeEnc() loop in EscapeFreeProc.

ferrieux added on 2009-11-16 16:14:11:
OK, several important things:
 - it is linked with the "jis0201" encoding appearing twice in the encoding hashtable + once as an escape encoding (subtable), but with total refcount 2 instead of 3.
 - adding an "if (refcount<=0) Tcl_Panic()" in FreeEncoding makes it reliably detectable.
 - it is reproducible even on unix, provided the system encoding is set to cp1252 on test entry
Bottom line: please wait a bit before relasing 8.5.8, I think this bug is both terribly dangerous and rather easy to fix.

 - though the bug is mostly asymptomatic, it is there all the time
 - under harsher memory conditions it could explode in full po

ferrieux added on 2009-11-14 23:24:10:
Hum, red herring. Tcl_GetEncoding does the refcount++.
Investigation goes on.

ferrieux added on 2009-11-14 23:15:16:
Closing in.
Breaking on entry to FreeEncoding with refcount==0, one sees the culprit is freeing the encoding from a subTablePtr. Now looking at the beginning of its lifecycle, we see in tclEncoding.c+3459:

subTablePtr->encodingPtr = encodingPtr;

without any ++ on the refcount !

ferrieux added on 2009-11-14 22:46:35:
Even better, removing all five special -D flags, I still see frees-below-1 (three 0s and one -1). So the bug is in there in default builds !

Test file error: ** NewEnc: 003E46C8 / 3084 / 1148
** NewEnc: 003E5720 / 3084 / 1148
** NewEnc: 003E5770 / 3084 / 1148
** NewEnc: 003E64F0 / 3084 / 1148
**FreeEnc: 00000000(0) / 3084 / 1148
**FreeEnc: 003E5770(2) / 3084 / 1148
**FreeEnc: 003E46C8(3) / 3084 / 1148
** NewEnc: 003E7750 / 3084 / 1148
**FreeEnc: 003E5720(2) / 3084 / 1148
**FreeEnc: 003E46C8(2) / 3084 / 1148
**FreeEnc: 003E7750(2) / 3084 / 1148
**FreeEnc: 003E7750(2) / 3084 / 1148
**FreeEnc: 003E7750(2) / 3084 / 1148
**FreeEnc: 003E7750(3) / 3084 / 1148
**FreeEnc: 003E7750(4) / 3084 / 1148
**FreeEnc: 003E7750(4) / 3084 / 1148
** NewEnc: 00AA64D8 / 3084 / 1148
**FreeEnc: 003E5720(3) / 3084 / 1148
**FreeEnc: 003E7750(4) / 3084 / 1148
** NewEnc: 00AA7280 / 3084 / 1148
**FreeEnc: 003E5720(3) / 3084 / 1148
**FreeEnc: 003E7750(4) / 3084 / 1148
** NewEnc: 00AA73D0 / 3084 / 1148
**FreeEnc: 003E5720(3) / 3084 / 1148
**FreeEnc: 003E7750(4) / 3084 / 1148
** NewEnc: 00AAEED8 / 3084 / 1148
**FreeEnc: 003E5720(3) / 3084 / 1148
**FreeEnc: 003E7750(4) / 3084 / 1148
** NewEnc: 00AA7428 / 3084 / 1148
**FreeEnc: 003E5720(3) / 3084 / 1148
**FreeEnc: 003E7750(4) / 3084 / 1148
** NewEnc: 00AAF5F0 / 3084 / 1148
**FreeEnc: 003E5720(3) / 3084 / 1148
** NewEnc: 00AAF8E0 / 3084 / 1148
**FreeEnc: 003E5720(2) / 3084 / 1148
**FreeEnc: 003E7750(3) / 3084 / 1148
**FreeEnc: 003E5770(3) / 3084 / 1148
**FreeEnc: 003E7750(2) / 3084 / 1148
**FreeEnc: 00AA7280(1) / 3084 / 1148
**FreeEnc: 00AA7428(1) / 3084 / 1148
**FreeEnc: 003E7750(1) / 3084 / 1148
**FreeEnc: 00AA73D0(2) / 3084 / 1148
**FreeEnc: 00AA73D0(1) / 3084 / 1148
**FreeEnc: 00AAF8E0(1) / 3084 / 1148
**FreeEnc: 00AA64D8(1) / 3084 / 1148
**FreeEnc: 00AA7280(0) / 3084 / 1148
**FreeEnc: 00AA73D0(0) / 3084 / 1148
**FreeEnc: 00AA73D0(-1) / 3084 / 1148
**FreeEnc: 00AAEED8(1) / 3084 / 1148
**FreeEnc: 00AA7428(0) / 3084 / 1148
**FreeEnc: 00AAF5F0(1) / 3084 / 1148
**FreeEnc: 003E46C8(1) / 3084 / 1148
**FreeEnc: 003E5720(1) / 3084 / 1148
**FreeEnc: 003E5770(2) / 3084 / 1148
**FreeEnc: 003E5770(1) / 3084 / 1148
**FreeEnc: 003E64F0(1) / 3084 / 1148
**FreeEnc: 003EBF00(4) / 3720 / 3812

ferrieux added on 2009-11-14 22:41:06:
While I'd still appreciate to reproduce the crash, I am seeing strange things.
I added fprintf's on entry to Create and Free encoding, giving

      encodingPtr (refcount)   ProcessId ThreadId

And, as you can see below, some encodings get re-freed after their refcount has reached zero and they've been freed for good (see the large refcount).
One nice thing is that all this is done from the same thread...

** NewEnc: 003E5ED0 / 3236 / 3844
** NewEnc: 003E5FB8 / 3236 / 3844
** NewEnc: 003E6E50 / 3236 / 3844
**FreeEnc: 00000000(0) / 3236 / 3844
**FreeEnc: 003E5FB8(2) / 3236 / 3844
**FreeEnc: 003E4DE0(3) / 3236 / 3844
** NewEnc: 003E7E68 / 3236 / 3844
**FreeEnc: 003E5ED0(2) / 3236 / 3844
**FreeEnc: 003E4DE0(2) / 3236 / 3844
**FreeEnc: 003E7E68(2) / 3236 / 3844
**FreeEnc: 003E7E68(2) / 3236 / 3844
**FreeEnc: 003E7E68(2) / 3236 / 3844
**FreeEnc: 003E7E68(3) / 3236 / 3844
**FreeEnc: 003E7E68(4) / 3236 / 3844
**FreeEnc: 003E7E68(4) / 3236 / 3844
** NewEnc: 00AC78C8 / 3236 / 3844
**FreeEnc: 003E5ED0(3) / 3236 / 3844
**FreeEnc: 003E7E68(4) / 3236 / 3844
** NewEnc: 00AC7920 / 3236 / 3844
**FreeEnc: 003E5ED0(3) / 3236 / 3844
**FreeEnc: 003E7E68(4) / 3236 / 3844
** NewEnc: 00AE19D0 / 3236 / 3844
**FreeEnc: 003E5ED0(3) / 3236 / 3844
**FreeEnc: 003E7E68(4) / 3236 / 3844
** NewEnc: 00AE3000 / 3236 / 3844
**FreeEnc: 003E5ED0(3) / 3236 / 3844
**FreeEnc: 003E7E68(4) / 3236 / 3844
** NewEnc: 00AE2FA8 / 3236 / 3844
**FreeEnc: 003E5ED0(3) / 3236 / 3844
**FreeEnc: 003E7E68(4) / 3236 / 3844
** NewEnc: 00AE5C48 / 3236 / 3844
**FreeEnc: 003E5ED0(3) / 3236 / 3844
** NewEnc: 00AE5BF0 / 3236 / 3844
**FreeEnc: 003E5ED0(2) / 3236 / 3844
**FreeEnc: 003E7E68(3) / 3236 / 3844
**FreeEnc: 003E5FB8(3) / 3236 / 3844
**FreeEnc: 003E7E68(2) / 3236 / 3844
**FreeEnc: 00AC7920(1) / 3236 / 3844
**FreeEnc: 00AE2FA8(1) / 3236 / 3844
**FreeEnc: 003E7E68(1) / 3236 / 3844
**FreeEnc: 00AE19D0(2) / 3236 / 3844
**FreeEnc: 00AE19D0(1) / 3236 / 3844
**FreeEnc: 00AE5BF0(1) / 3236 / 3844
**FreeEnc: 00AC78C8(1) / 3236 / 3844
**FreeEnc: 00AC7920(1633771873) / 3236 / 3844
**FreeEnc: 00AE19D0(1633771873) / 3236 / 3844
**FreeEnc: 00AE19D0(1633771872) / 3236 / 3844
**FreeEnc: 00AE3000(1) / 3236 / 3844
**FreeEnc: 00AE2FA8(1633771873) / 3236 / 3844
**FreeEnc: 00AE5C48(1) / 3236 / 3844
**FreeEnc: 003E4DE0(1) / 3236 / 3844
**FreeEnc: 003E5ED0(1) / 3236 / 3844
**FreeEnc: 003E5FB8(2) / 3236 / 3844
**FreeEnc: 003E5FB8(1) / 3236 / 3844
**FreeEnc: 003E6E50(1) / 3236 / 3844
**FreeEnc: 003EC6C8(4) / 568 / 3212

dkf added on 2009-11-14 22:38:32:
Cannot reproduce on OSX with 8.5 tip, compiled with gcc.

ferrieux added on 2009-11-14 06:31:24:
Cannot seem to repro on XPSP3, 8.5 HEAD.
Please confirm procedure:
(1) edit Makefile: COMPILE_DEBUG_FLAGS = -DTCL_MEM_DEBUG -DTCL_COMPILE_DEBUG -DTCL_COMPILE_STATS -DTCL_THREADS=1 -DUSE_THREAD_ALLOC=1
(2) make clean;make;make test TESTFLAGS="-f encoding.test"

Didn't make genstubs, right ?

(As a side note, DTCL_THREADS=1 is strange because it seems to indicate a #if while in the code it is an #ifdef).

dgp added on 2009-11-05 00:48:49:
Is this a new failure?  Since when?

Attachments: