Ticket UUID: | 3006741 | |||
Title: | SIGSEGV in Tcl_LogCommandInfo | |||
Type: | Bug | Version: | None | |
Submitter: | coldstore | Created on: | 2010-05-25 03:28:58 | |
Subsystem: | 45. Parsing and Eval | Assigned To: | ferrieux | |
Priority: | 8 | Severity: | ||
Status: | Closed | Last Modified: | 2010-05-28 15:20:21 | |
Resolution: | Duplicate | Closed By: | coldstore | |
Closed on: | 2010-05-28 03:14:00 | |||
Description: |
Tcl_LogCommandInfo crashes Program received signal SIGSEGV, Segmentation fault. Tcl_LogCommandInfo (interp=0x804da18, script=0x88ecae8 "::apply {r {\n variable pending 0\n variable starting 1\n set r [::yield]\n \n while {$starting || $pending > 0} {\n\tset starting 0\t;# one shot at starting\n\t# this coroutine loops around acce"..., command=0x88ecae8 "::apply {r {\n variable pending 0\n variable starting 1\n set r [::yield]\n \n while {$starting || $pending > 0} {\n\tset starting 0\t;# one shot at starting\n\t# this coroutine loops around acce"..., length=3142) at /home/colin/Desktop/packages/tcl/generic/tclNamesp.c:4964 CVS checkin dates tested: 2010.04.05.14.00.00 - OK 2010.04.06.14.00.00 - breaks Stack backtrace and relevant inspection attached. | |||
User Comments: |
coldstore added on 2010-05-28 15:20:21:
I would recommend conditional narrative debugging emitted on Callframe POP and PUSH and on CONTEXT_SAVE and _RESTORE. The narrative would consist of a dump of the stack's contents in hex. I've got the code, and could throw some #ifdefs around it, and roll it into the core, if that's broadly considered acceptable. ferrieux added on 2010-05-28 15:15:05: allow_comments - 0 For reference, let's spell out the better circumscribed bugreport that you've just created on the issue: 3008307 . As a an aside, how would you "perform this kind of forensic inspection" ? static int IsCorrupted(CallFrame * frame) { /* magical code here */ } ? coldstore added on 2010-05-28 10:14:00: allow_comments - 1 The cause of this SEGV was found to be stack corruption caused by a bug in coroutines. It's good that the error-time collection of frame information flushed this out, but I think it may be worth adding some conditional debuggin narrative to explicitly perform this kind of forensic inspection. ferrieux added on 2010-05-25 21:17:40: Thank you very much for all this work Colin. Now my knowledge of this is currently restricted to that of a mere user of the xxxFramePtrs's. The patches in question (TIP 348) indeed crawl the linked list of frames in order to report the objv[] at each level (CALL elements of errorstack). A secondary task (which wasn't initially intended) is to detect [uplevel] shifts and report them (UP elements). At some point during the development I had noticed that shunning the root frame helped avoiding artifacts at the end of the list; but dgp chimed in and offered a different logic, which is in HEAD now. Donald, maybe you have an insight on the present issue ? Clearly, all this critically depends on the invariants of the xxxFramePtr graph, which I'd love to get my hands on :} I asked Miguel for a bit of help. coldstore added on 2010-05-25 20:15:53: I spent approximately 4 hours today looking for ways to trap this bug under gdb and valgrind, and tracing the fault to these patches. The problem is that in some circumstances the interp->framePtr->callerVarPtr chain is broken. Is there a case in which a frame can be created containing an uninitialized callerVarPtr? I certainly could not observe callerVarPtr being explicitly set to the value which I consistently observed to be the bogus pointer. Given the context within which the segv occurred, I suggest that these cases may be associated with ::apply or with ::coroutine, but defer to Alexandre's specific knowledge of the patches and what they seek to achieve. I avoided the crash by testing for the condition 'iPtr->varFramePtr != iPtr->rootFramePtr' in addition to the existing test 'iPtr->varFramePtr != iPtr->framePtr'. I note that this condition normally terminates the function in which the crash occurs, except in the case that varFramePtr != rootFramePtr, which is (I believe incorrectly) interpreted by the code as meaning that there's an uplevel in play. I do not know under which conditions both of these predicates might be true, nor the accurate and complete interpretation of varFramePtr != framePtr, but something smells wrong about the interpretation being placed upon it, and I suspect the assumption that it strongly implies [uplevel] is false in the case of either ::coroutine or ::apply, or both. coldstore added on 2010-05-25 19:49:06: I can reproduce it perfectly reliably. The problem arises in tclNamesp.c:4958 where you have 'if (iPtr->varFramePtr != iPtr->framePtr', and then traverse a chain of links following frame->callerVarPtr when one of those links (the second, in this case) wasn't in fact a frame, but a string, and interestingly always the same string. I note that when the crash was occurring iPtr->varFramePtr == iPtr->rootFramePtr I also note that the crash is occurring in the top-most level of a coroutine. Is it possible that coroutines' top level frame (that is, #1 within the coro) doesn't have its frame initialized to correctly point to rootFramePtr as its caller var frame? It's also worth noting that the error is occurring in an ::apply which is running as the command of a coro, and both the ::apply namespace and the coro's namespace are not what you would sort of normally expect. It's a pretty complex case. ferrieux added on 2010-05-25 19:03:46: Don't worry Jan, Colin did this because of an SF shortage. He sent me additional info by e-mail. We'll be bringing those data here now that SF seems to be back up. nijtmans added on 2010-05-25 17:52:19: Without an example script which causes this SIGSEGV, it's almost impossible to reproduce this. Sure, it's related to the TIP #348 Implementation. coldstore added on 2010-05-25 10:29:20: File Added - 375048: gdb |
Attachments:
- gdb [download] added by coldstore on 2010-05-25 10:29:16. [details]