Ticket UUID: | 923072 | |||
Title: | atfork patch | |||
Type: | Patch | Version: | None | |
Submitter: | davidw | Created on: | 2004-03-25 12:25:39 | |
Subsystem: | 49. Threading | Assigned To: | vasiljevic | |
Priority: | 5 Medium | Severity: | ||
Status: | Open | Last Modified: | 2006-05-27 12:26:27 | |
Resolution: | None | Closed By: | ||
Closed on: | ||||
Description: |
Patch mentioned here: http://sourceforge.net/tracker/index.php?func=detail&aid=749639&group_id=10894&atid=110894 Another reference: http://sourceforge.net/tracker/index.php?func=detail&aid=748835&group_id=13247&atid=113247 | |||
User Comments: |
das added on 2006-05-27 12:26:27:
File Added - 179459: tclMacOSXNotify-atfork_HEAD.patch Logged In: YES user_id=90580 attached patch to HEAD that implements a pthread_atfork() handler for the Mac OS X specific notifier in tclMacOSXNotify.c (used when configured with --enable-corefoundation), this allows unthreaded CF tcl to [fork] without hanging at notifier finalization: the handler recreates CoreFoundation state and notifier thread in the child. Note that pthread_atfork() is available starting with OS X 10.4 Tiger only, so tcl built on earlier releases will not include this fix. Because vfork() is used by the core on OS X, the atfork handler does not run for [exec] and [open |], so core commands are not affected; the handler only gets called when an extension/embedder calls fork(), such as TclX's [fork]. On Tiger, this makes [fork] safe to use from tcl with --disable-threads irrespective of corefoundation configuration option; but of course it doesn't fix the generic stale locked mutex issue with threaded tcl and fork also discussed here, so threaded CF tcl can still deadlock in the child if any locks are held in the parent at the time of the [fork]. patch committed to HEAD and core-8-4-branch vasiljevic added on 2004-12-18 01:50:20: Logged In: YES user_id=95086 Some more thoughts... This patch actually attempts to solve 2 problems at a time: 1. Assure all Tcl mutexes are in the sane state after the fork, so that the Tcl library in the child process is in the sane state as well. 2. Reinitialize (actually, re-start) the notifier thread in the child, since it is not replicated by the fork call but is/(will be) needed by the Tcl lib in the child. Only the step 2. is the actual fix of the initial problem as described in the references to bug-reports above. This ain't that difficult to solve by pthread_atfork mechanics and should not introduce any unwanted side effects. At least I can't think of any at this point. The step 1. attempts to solve the more difficult problem of the state of internal mutexes used by the Tcl library after the fork. This is far more complex than it looks like on the first glance (deadlock embrace issues)... vasiljevic added on 2004-12-18 01:30:24: Logged In: YES user_id=95086 Just a life-sing.. I'm still looking into this... Aparently, the *main* problem is with the Tcl notifier thread and its handling in the child process, i.e. the fact that child has not have access to the notifier thread because it is simply not there after the fork. Attempting to lock all mutexes in master and then release them again after the fork in both master and child processes, plus re-initializing the notifier subsystem (effectively re-creating the notifier thread) in the child process is a logical way of solving this puzzle *only* if you have control of the lock hierarchy... In the nature of Tcl lib, this is not guaranteed, hence it will lead possibly (and very probably) to various deadlock-embrace situations. Also, the patch as provided, will destroy the reference count of the notifier thread which is not good but this is a minor issue which can be fixed. I see no problem in this. I do see problems in wholesale locking of Tcl mutexes though. This is the main problem and we have to see if this step is really needed and what would be the consequence of not doing it. I'm afraid some hard compromises would be required and those may/will have consequences and lead to some special-cases which would be pretty unelegant, design-wise. Well, this all requires more thinking... vasiljevic added on 2004-12-08 18:09:24: Logged In: YES user_id=95086 Thanks for the hint about the links. Oh,. I just did not read the original posting right. Sorry for this. I will look into this. Yeah, it would be good to bring this topic to the tcl core list to see what other people would say about that. I can do this after having peeked into the reports. davidw added on 2004-12-08 16:04:25: Logged In: YES user_id=240 The two URL's in the original 'patch report', above, are links to the bugs filed against Tcl and Tclx. I've seen people ask about this every so often on the newsgroup. I guess if you say so it could happen, you're the expert. That part of the patch came from Jeff (I just cleaned it up and added comments/tests and so on). What might might be doable as a workaround? Would it be possible to include some of this logic from the patch in the [fork] implementation itself, so that in the manual you could say: "be sure that you don't hold any of the mutexes before forking". I don't know what the answer is, but this bug has been sitting there for a year and a half at this point, and I'd like to see some discussion on it with the goal of figuring out an answer. Maybe it's appropriate to do that on tcl-core so more minds are brought to bear. vasiljevic added on 2004-12-08 12:43:54: Logged In: YES user_id=95086 "What you describe is not very likely, but still an impediment to a clean fix" Oh, please believe me: it is *very* likely. Think of heavy Tcl threading apps like AOLserver. There you have the immediate problem of exec and cgi processing which are/would be significantly affected. I would rather vote for a cleaner fix. BTW, would you please point me to some of those bug-reports you mentioned? I would like to re-check them and see if we can find some solution. davidw added on 2004-12-07 05:28:40: Logged In: YES user_id=240 It is tricky all right. What you describe is not very likely, but still an impediment to a clean fix. The bug is real though and, for instance on Debian, Tcl cannot fork(), which has been noticed by several people to date. We need to figure out either 1) a good fix, or 2) a workaround that at least lets people do what they need. A workaround might include a way for users to specify if they have any tricky packages that need special care. vasiljevic added on 2004-12-05 16:53:52: Logged In: YES user_id=95086 The problem is as I already described: This patch is looping thru all Tcl-created mutexes in an linear fashion but this might lead to deadlocks I'm afraid. The Tcl mutexes are of dynamic nature and users (extensions) can create and *lock* them in any *arbitrary* order. This will lead to deadlock if some extension is holding a lock-hierarchy and somebody (another thread) tries to fork at that time. This is very tricky business... davidw added on 2004-12-05 16:35:56: Logged In: YES user_id=240 Hi Zoran - did anything ever come of this patch? Thanks, Dave davidw added on 2004-06-18 04:08:33: Logged In: YES user_id=240 Don't mind the bit at the top of the test patch.... I probably accidentally included some of the big change I made to the test suite. As for pthread_once and the mutex locking/unlocking, that, and the rest of the core functionality of the patch is all Jeff's. I just prettied it up some, and added some comments and tests. dkf added on 2004-06-17 15:05:59: Logged In: YES user_id=79902 Speaking as someone who recently went through the core test suite fixing up constraint setup and use, that should just refer to: # Set operation... tcltest::testConstraint testthread \ [llength [info commands testthread]] # Read operation... if {[tcltest::testConstraint testthread]} { ... No variable or namespace inside the tcltest namespace should be referred to directly. vasiljevic added on 2004-06-17 13:18:38: Logged In: YES user_id=95086 Hi David, I'm currently reviewing the patch. I have applied it to the core-8-4-branch and get problem with thread.test. The offending part: # Some tests require the testthread command -set ::tcltest::testConstraints(testthread) \ - [expr {[info commands testthread] != {}}] +#set ::tcltest::testConstraints(testthread) \ +# [expr {[info commands testthread] != {}}] -if {$::tcltest::testConstraints(testthread)} { +tcltest::constraints::cset testthread [expr {[info commands testthread] != {}}] + +if {[::tcltest::constraints::cset testthread]} { The "::tcltest::constraints::cset" is unknown. I do a simple: patch -p0 < newatfork.diff cd unix make make test .... Test file error: invalid command name "::tcltest::constraints::cset" while executing "::tcltest::constraints::cset testthread" (file "/usr/local/homes/zv/sf/core-8-4-branch/tests/thread.test" line 1) timer.test .... Another thing is: you should beware of pthread_once. This is not defined in many posix-thread implementations. Safer is to use mutex and a static int to simulate the behaviour. I will yet have to make some tests here. Where I have a bad feeling is the lock order of mutexes. You are basically looping thru all Tcl-created mutexes in an linear fashion but this might lead to deadlocks I'm afraid. The Tcl mutexes are of dynamic nature and users can creaet and lock them in any arbitrary order. I still have to verify that this won't dwarf the effort, i.e. lead to a deadlock just before the fork. This will take some time. Also, the patch provided is ment for 8.4 branch only, right? Zoran davidw added on 2004-06-17 03:59:59: File Added - 90899: newatfork.diff Logged In: YES user_id=240 Updated patch. davidw added on 2004-03-25 19:25:40: File Added - 81327: atfork.diff |