Tcl Source Code

View Ticket
Login
Ticket UUID: 923072
Title: atfork patch
Type: Patch Version: None
Submitter: davidw Created on: 2004-03-25 12:25:39
Subsystem: 49. Threading Assigned To: vasiljevic
Priority: 5 Medium Severity:
Status: Open Last Modified: 2006-05-27 12:26:27
Resolution: None Closed By:
    Closed on:
Description:
Patch mentioned here:
http://sourceforge.net/tracker/index.php?func=detail&aid=749639&group_id=10894&atid=110894

Another reference:
http://sourceforge.net/tracker/index.php?func=detail&aid=748835&group_id=13247&atid=113247
User Comments: das added on 2006-05-27 12:26:27:

File Added - 179459: tclMacOSXNotify-atfork_HEAD.patch

Logged In: YES 
user_id=90580

attached patch to HEAD that implements a pthread_atfork()
handler for the Mac OS X specific notifier in
tclMacOSXNotify.c (used when configured with
--enable-corefoundation), this allows unthreaded CF tcl to
[fork] without hanging at notifier finalization: the handler
recreates CoreFoundation state and notifier thread in the child.
Note that pthread_atfork() is available starting with OS X
10.4 Tiger only, so tcl built on earlier releases will not
include this fix.

Because vfork() is used by the core  on OS X, the atfork
handler does not run for [exec] and [open |], so core
commands are not affected; the handler only gets called when
an extension/embedder calls fork(), such as TclX's [fork].

On Tiger, this makes [fork] safe to use from tcl with
--disable-threads irrespective of corefoundation
configuration option; but of course it doesn't fix the
generic stale locked mutex issue with threaded tcl and fork
also discussed here, so threaded CF tcl can still deadlock
in the child if any locks are held in the parent at the time
of the [fork].

patch committed to HEAD and core-8-4-branch

vasiljevic added on 2004-12-18 01:50:20:
Logged In: YES 
user_id=95086

Some more thoughts... 
 
This patch actually attempts to solve 2 problems 
at a time: 
 
1. Assure all Tcl mutexes are in the sane state after 
the fork, so that the Tcl library in the child process is 
in the sane state as well. 
 
2. Reinitialize (actually, re-start) the notifier thread 
in the child, since it is not replicated by the fork call 
but is/(will be) needed by the Tcl lib in the child.  
 
Only the step 2. is the actual fix of the initial problem 
as described in the references to bug-reports above. 
This ain't that difficult to solve by pthread_atfork  
mechanics and should not introduce any unwanted 
side effects. At least I can't think of any at this point. 
 
The step 1. attempts to solve the more difficult problem 
of the state of internal mutexes used by the Tcl library 
after the fork. This is far more complex than it looks like 
on the first glance (deadlock embrace issues)...

vasiljevic added on 2004-12-18 01:30:24:
Logged In: YES 
user_id=95086

Just a life-sing.. I'm still looking into this...

Aparently, the *main* problem is with the Tcl notifier
thread and its handling in the child process, i.e. the fact
that child has not 
have access to the notifier thread because it is simply not
there
after the fork. 

Attempting to lock all mutexes in master and then release them
again after the fork in both master and child processes, plus 
re-initializing the notifier subsystem (effectively
re-creating the
notifier thread) in the child process is a logical way of
solving 
this puzzle *only* if you have control of the lock hierarchy...

In the nature of Tcl lib, this is not guaranteed, hence it
will lead possibly (and very probably) to various
deadlock-embrace situations. Also, the patch as provided,
will destroy the reference count of the notifier thread
which is not good but this is a minor issue which can be
fixed. I see no problem in this.
I do see problems in wholesale locking of Tcl mutexes though.
This is the main problem and we have to see if this step
is really needed and what would be the consequence of not
doing it.  I'm afraid some hard compromises would be required
and those may/will have consequences and lead to some
special-cases which would be pretty unelegant, design-wise.

Well, this all requires more thinking...

vasiljevic added on 2004-12-08 18:09:24:
Logged In: YES 
user_id=95086

Thanks for the hint about the links. Oh,. I just did not read 
the original posting right. Sorry for this. 
 
I will look into this. Yeah, it would be good to bring this 
topic to the tcl core list to see what other people would 
say about that. I can do this after having peeked into 
the reports.

davidw added on 2004-12-08 16:04:25:
Logged In: YES 
user_id=240

The two URL's in the original 'patch report', above, are
links to the bugs filed against Tcl and Tclx.  I've seen
people ask about this every so often on the newsgroup.

I guess if you say so it could happen, you're the expert. 
That part of the patch came from Jeff (I just cleaned it up
and added comments/tests and so on).

What might might be doable as a workaround?  Would it be
possible to include some of this logic from the patch in the
[fork] implementation itself, so that in the manual you
could say: "be sure that you don't hold any of the mutexes
before forking".

I don't know what the answer is, but this bug has been
sitting there for a year and a half at this point, and I'd
like to see some discussion on it with the goal of figuring
out an answer.  Maybe it's appropriate to do that on
tcl-core so more minds are brought to bear.

vasiljevic added on 2004-12-08 12:43:54:
Logged In: YES 
user_id=95086

"What you describe is not very 
likely, but still an impediment to a clean fix" 
 
Oh, please believe me: it is *very* likely. Think of heavy Tcl  
threading apps like AOLserver. There you have the immediate 
problem of exec and cgi processing which are/would be significantly  
affected.  
I would rather vote for a cleaner fix. 
BTW, would you please point me to some of those bug-reports 
you mentioned? I would like to re-check them and see if we can 
find some solution.

davidw added on 2004-12-07 05:28:40:
Logged In: YES 
user_id=240

It is tricky all right.  What you describe is not very
likely, but still an impediment to a clean fix.  The bug is
real though and, for instance on Debian, Tcl cannot fork(),
which has been noticed by several people to date.  We need
to figure out either 1) a good fix, or 2) a workaround that
at least lets people do what they need.  A workaround might
include a way for users to specify if they have any tricky
packages that need special care.

vasiljevic added on 2004-12-05 16:53:52:
Logged In: YES 
user_id=95086

The problem is as I already described: 
This patch is looping thru all Tcl-created mutexes in an linear 
fashion but this might lead to deadlocks I'm afraid.  
The Tcl mutexes are of dynamic nature and users (extensions)  
can create and *lock* them in any *arbitrary* order.  
This will lead to deadlock if some extension is holding a lock-hierarchy 
and somebody (another thread) tries to fork at that time. 
This is very tricky business...

davidw added on 2004-12-05 16:35:56:
Logged In: YES 
user_id=240

Hi Zoran - did anything ever come of this patch?

Thanks,
Dave

davidw added on 2004-06-18 04:08:33:
Logged In: YES 
user_id=240

Don't mind the bit at the top of the test patch.... I
probably accidentally included some of the big change I made
to the test suite.

As for pthread_once and the mutex locking/unlocking, that,
and the rest of the core functionality of the patch is all
Jeff's.  I just prettied it up some, and added some comments
and tests.

dkf added on 2004-06-17 15:05:59:
Logged In: YES 
user_id=79902

Speaking as someone who recently went through the core test
suite fixing up constraint setup and use, that should just
refer to:

  # Set operation...
  tcltest::testConstraint testthread \
          [llength [info commands testthread]]
  # Read operation...
  if {[tcltest::testConstraint testthread]} { ...

No variable or namespace inside the tcltest namespace should
be referred to directly.

vasiljevic added on 2004-06-17 13:18:38:
Logged In: YES 
user_id=95086

Hi David,

I'm currently reviewing the patch. I have applied it to the
core-8-4-branch and get problem with thread.test. The
offending part:

 # Some tests require the testthread command

-set ::tcltest::testConstraints(testthread) \
-       [expr {[info commands testthread] != {}}]
+#set ::tcltest::testConstraints(testthread) \
+#      [expr {[info commands testthread] != {}}]

-if {$::tcltest::testConstraints(testthread)} {
+tcltest::constraints::cset testthread [expr {[info commands
testthread] != {}}]
+
+if {[::tcltest::constraints::cset testthread]} {

The "::tcltest::constraints::cset" is unknown. 
I do a simple:

patch -p0 < newatfork.diff
cd unix
make
make test
....
Test file error: invalid command name
"::tcltest::constraints::cset"
    while executing
"::tcltest::constraints::cset testthread"
    (file
"/usr/local/homes/zv/sf/core-8-4-branch/tests/thread.test"
line 1)
timer.test
....

Another thing is: you should beware of pthread_once. This is not
defined in many posix-thread implementations. Safer is to use
mutex and a static int to simulate the behaviour.

I will yet have to make some tests here. Where I have a bad
feeling is the lock order of mutexes. You are basically looping 
thru all Tcl-created mutexes in an linear fashion but this
might 
lead to deadlocks I'm afraid. The Tcl mutexes are of dynamic
nature and users can creaet and lock them in any arbitrary 
order. I still have to verify that this won't dwarf the
effort, i.e.
lead to a deadlock just before the fork. This will take some
time.

Also, the patch provided is ment for 8.4 branch only, right?

Zoran

davidw added on 2004-06-17 03:59:59:

File Added - 90899: newatfork.diff

Logged In: YES 
user_id=240

Updated patch.

davidw added on 2004-03-25 19:25:40:

File Added - 81327: atfork.diff

Attachments: