Ticket UUID: | b47b176adf7c33681946162ba6a171281ff8381e | |||
Title: | iortrans.tf-11.0 segfault | |||
Type: | Bug | Version: | trunk | |
Submitter: | dgp | Created on: | 2014-06-18 13:06:53 | |
Subsystem: | 26. Channel Transforms | Assigned To: | dgp | |
Priority: | 5 Medium | Severity: | Severe | |
Status: | Closed | Last Modified: | 2014-06-19 16:54:32 | |
Resolution: | Fixed | Closed By: | aku | |
Closed on: | 2014-06-19 16:54:32 | |||
Description: |
Not every time, but if the test iortrans.tf-11.0 is forced to run again and again, eventually it will fail with a segfault: ---- iortrans.tf-11.0 start Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2aaaae910940 (LWP 24638)] 0x0000000000542697 in DeleteThreadReflectedTransformMap (clientData=0x0) at /home/dgp/fossil/tcl/generic/tclIORTrans.c:2353 2353 paramPtr = evPtr->param; (gdb) print evPtr $1 = (ForwardingEvent *) 0x0 | |||
User Comments: |
aku added on 2014-06-19 16:54:32:
Thank you for the analysis and fix. I will think on a suitable comment to add to the code to point out this variability (i.e. that DeleteReflectedTransformMap() may be followed by DeleteThreadReflectedTransformMap() instead of the originator thread handling it. Will see if I can take the time to fix the same issue in tclIORChan.c (8.5 first, then trunk). dgp added on 2014-06-19 16:39:10: fixed on trunk. dgp added on 2014-06-19 16:31:27: It appears that the code is written making the assumption that when ConditionNotify is called and the the mutex is unlocked, that the next thing that will lock the mutex is the ConditionWait. That doesn't have to be true. In the failing case, ForwardOpToOwnerThread() in thread 1 placed a ForwardingResult on the forwardList and then waited in a ConditionWait. Then in thread 2 a [thread::exit] happens, which first leads to a call to DeleteReflectedTransformMap() when an interp in thread 2 is being torn down. This calls ForwardSetStaticError() and then ConditionNotify() and unlocks the mutex. If at this point, as often happens, thread 1 were to wake up and take over, it would pull the now-dead ForwardingResult off the forwardList, and all would be well.... ....however, it can and sometimes does happen, that the next lock on the mutex will be acquired by thread 2 as it continues tearing down the thread, and now is in the call of DeleteThreadReflectedTransformMap(). The ForwardingResult is still on the forwardList, but now has resultPtr->evPtr == NULL, which leads to the crash. What must change is that no code processing the items on the forwardList can assume that just because a resultPtr is on the list, it must have a non-NULL resultPtr->evPtr. Fortunately that's an easy fix. Just check for NULL and continue when it is seen. dgp added on 2014-06-19 15:44:31: In the segfaulting run, the Tcl_ConditionWait() call doesn't return. aku added on 2014-06-18 19:55:33: A quick note, the forwarding code in IORTrans should be identical to the forwarding code in tclIORChan. Bugs in one are likely in the other as well. dgp added on 2014-06-18 14:11:39: (gdb) bt #0 0x0000000000542697 in DeleteThreadReflectedTransformMap (clientData=0x0) at /home/dgp/fossil/tcl/generic/tclIORTrans.c:2353 #1 0x00000000004ffbcf in Tcl_FinalizeThread () at /home/dgp/fossil/tcl/generic/tclEvent.c:1294 #2 0x000000000057fb58 in Tcl_ExitThread (status=0) at /home/dgp/fossil/tcl/generic/tclThread.c:470 #3 0x00000000004fff9e in NewThreadProc (clientData=0xa44df0) at /home/dgp/fossil/tcl/generic/tclEvent.c:1559 #4 0x0000003ddb00683d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003dda4d526d in clone () from /lib64/libc.so.6 Passing to aku, in hopes he understands the management of the "forwardList" so I don't have to digest it. |
Attachments:
- rtrans-forwarding.txt [download] added by aku on 2014-06-18 20:38:43. [details]