Tcl Source Code

View Ticket
Login
Ticket UUID: c4e230f29b5cd4854af0abc822780b7118cba8b7
Title: unixeventfork-1.1 hangs on OS X
Type: Bug Version: e733a91cdb
Submitter: dgp Created on: 2013-08-03 14:55:43
Subsystem: 01. Notifier Assigned To: dgp
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2013-08-05 22:19:38
Resolution: None Closed By: nobody
    Closed on:
Description:
$ cd tcl/unix
$ ./configure --disable-shared
...
$ make test-tcl TESTFLAGS="-file unixForkEvent.test -singleproc 1"
...
Tests began at Sat Aug 03 10:52:04 EDT 2013
unixForkEvent.test
Tcl_WaitForEvent: Notifier not initialized

The panic message is from the forked process.

The original process continues to run, but is
trapped in a [while] waiting for the child to
make a file.
User Comments: jan.nijtmans added on 2013-08-05 22:19:38:
I agree with dgp's remarks. Since the testcase has little proof in the wild and failing the test in no way indicates a wrongly built Tcl, I marked this test-case nonPortable. This reduces the severity of this issue.

The bug-c4e230f29b branch is now rebased to trunk, as a Notifier rewrite making this work is probably too drastic to be backported to 8.5. Anyone who wants to give it a try, please do so, but it's above my knowledge.

Since apparently fork() never worked in OSX Tcl and no-one reported that, it appears to be low-priority. Anyway, we have a test-case now (thanks to Harald!)
and a branch "bug-c4e230f29b" for anyone who wants to make fork() work in OSX Tcl.

dgp added on 2013-08-05 12:35:20:
It looks to me that on (recent enough) OS X,
the "don't call fork() without exec()" rule
is not something imposed by Tcl, but something
the system itself very strongly wants.  (At least
when the "CoreFoundation" is in use, which it is
in the Tcl source code.)

I think the best next step is to limit the creation
of the [testfork] testing command so that it is not
created on OS X executables.  Then the existing
constraint on test unixforkevent-1.1 that [testfork]
must exist will be enough to protect against trouble
in the test suite.

The other thing to make note of somewhere, somehow, is
that this limited offering of some fork() without exec()
in programs using Tcl is not portable to OS X.  I think
it's already not portable to Windows, so further refining
the accuracy of portability limits shouldn't be a severe
problem.

I do not know whether Apache / Rivet have portability to
OS X on their list of goals.  If so, this will continue
to be a problem for them.

The only other conceivable option I see is for some true
OS X wizard developer to code a significant rewrite that
avoids or works around the constraints of CoreFoundation.
Whether that's impossible or merely difficult I cannot say.

oehhar added on 2013-08-05 08:39:13:

The test exits with an error message now, if the forked process stalls. In addition, the case of not entirely written result file is catched by a separate trigger file. Checkin: [c008fa3bbd] Harald


jan.nijtmans added on 2013-08-05 07:53:55:
Looks like fixing this needs a significant rewrite of the OSX Notifier. If the mentioned core-foundation calls cannot be used in forked processes, then the Parent fork can use the OSX Notifier just fine, but in the childs the Notifier needs to be re-initialized as pure-UNIX notiefier. Ough.

I don't think this can be fixed easily, we need an OSX expert here.

dgp added on 2013-08-04 18:46:54:
possibly related information here:

http://objectivistc.tumblr.com/post/16187948939/you-must-exec-a-core-foundation-fork-safety-tale

dgp added on 2013-08-04 18:34:17:
Thanks for that.  The result is different, but not success:

$ make test-tcl TESTFLAGS="-file unixForkEvent.test -singleproc 1"
...
Tests began at Sun Aug 04 14:32:31 EDT 2013
unixForkEvent.test
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
...

and then that double message is repeated about a dozen times
before things hang up .

jan.nijtmans added on 2013-08-04 16:40:41:

If the pthread_atfork() callbacks leave the notifier in the unitialized state (believing the panic message), let' initialized it, See [7de91edd9b].

Does this make the test-case pass?

Another recommended improvement: Add a timout mechanism so if the test fails it wouldn't hang any more bug fail with a proper message.