|Title:||Standard channels (stdin, stdout, stderr) on Windows may get broken by dup2()|
|Submitter:||abv||Created on:||2018-11-09 21:51:41|
|Subsystem:||25. Channel System||Assigned To:||nobody|
|Status:||Pending||Last Modified:||2018-11-12 19:45:37|
1. The problem (Tcl level, root cause)
On Windows, standard channels (stdin, stdout, stderr) in Tcl are initialized by OS handles returned by WinAPI function GetStdHandle() (see TclpGetDefaultStdChannel() defined in win/tclWinChan.c).
These handles may be invalidated (closed, reopened, reassigned to different kind of object) by C/C++ code. In particular, function _dup2() of standard C library (MSVC), when called for the standard file number (0, 1, 2) as second argument, closes the OS handle associated with the standard stream, then creates the new handle and sets is as standard one by call to SetStdHandle().
In most cases the old and new handles are the same (apparently due to reuse), thus there are no immediate consequences. However, sometimes (in my experiments about once per several thousand calls), the new standard handle assigned by the system is different from the old one. Yet Tcl channel still keeps the old handle and when trying to use that channel (e.g. use puts to write to stdout), error occurs.
2. The problem (application level, reproducer)
We are using Tcl as command-line tool to organize testing of a C++ library. When the test is executed interactively, the test system intercepts the output to stdout and stderr streams (from C code) so that it can be analysed.
In this context, sometimes execution of a test script ends up with Tcl reporting "error writing "stdout": bad file number"
Alas I have no isolated reproducer for the problem (no sufficient expertise with Tcl). However it can be reproduced within a test system (the software is open source), as follows:
a) Install OCCT either from https://www.opencascade.com/content/latest-release or build it from sources (download link: http://git.dev.opencascade.org/gitweb/?p=occt.git;a=snapshot;h=refs/tags/V7_3_0;sf=zip)
b) Run draw.bat
c) Type "test perf bop boxholes"
The problem has been reproducible for years, with Tcl 8.5 - 8.6. For debugging the problem I have used MSVC 2017 (15.8.4) Community Edition, working on Windows 10 64-bit. The last Tcl version tried was GitHub Tcl repository master as of Nov 06
3. The proposed solution
The problem can be solved by duplicating the standard handle returned by GetStdHandle() (in TclpGetDefaultStdChannel()) and using the duplicate for initialization of the Tcl channel.
Here is a diff:
win/tclWinChan.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
The only drawback that I foresee is that when standard stream is redirected on a C library level, Tcl will keep working with initial stream. Though this should be quite logical taken that Tcl has channels bound to WinAPI and not standard C library, which is apparently intentional.
abv added on 2018-11-12 19:45:37:
@sebres, thanks for commenting!
I agree the situation is rare bacause (a) it is Windows-specific, (b) redirection of stdin at runtime is rarely used, and (c) the problem manifests once per several thousand calls.
In my scenario, the call to dup2() occurs after initialization of the Tcl channels. Note that this can happen in the C code that does not know anything about Tcl at all. In this case the handle stored by Tcl channel might become invalid (and it happens in my case), and Tcl has no means to know about that.
In my case the place where dup2() is called is aware of Tcl. Calling Tcl_SetStdChannel() was the first thing I considered, but the problem is how to release the existing Tcl channel that uses invalidated handle. That channel has no any flag that would indicate that this handle was not properly allocated, and it will try deleting that handle on destruction. This leads to error indeed.
Another point is that calling Tcl_SetStdChannel() would be not sufficient: by that time, Tcl intepretor is already created and has its own copy of the same channel, thus it should be necessary to update channels stored in the interpretor as well (in general case - all existing interpretors).
The proposed solution does solve the problem since it creates the duplicate of the system handle which is then owned by the channel so it can be released safely, regardless of whether the standard stream was redefined or not.
Duplication of the handle does not cause the error because the device (console in my case, but it can be different) is not closed, just the handle to that device set as "standard handle" in the system gets closed and opened again (sometimes with different value).
Besides, you can grep the code for 'GetStdHandle' to see several workarounds against closing of the standard handle when Tcl channel is closed. These workarounds could perhaps be removed after the fix. I have not touch them though because they may still be needed in other scenarios.
sebres added on 2018-11-12 08:51:44:
Hmm... Never saw this using tcl for win (in tcl-shell as well as in own binaries as library and even static linked). Furthermore although I can imagine the situation like this (theoretically), I don't understand how the suggested solution can fix that (excepting the case where standard handles are modified without notifying tcl about that).
If the standard handle is altered (closed, reopened, whatever), is the duplication of this (for example closed) handle does not cause the same error?
Or still another question - if the altering takes place hereafter (e.g. after initial call of
It looks like a 3rd party issue to me - the OCCT (or dependencies) should simply call
Therefore the duplication as described as possible solution is rather a workaround, and may produce unexpected behavior resp. cause strange effects on other software (especially multi-threaded) that uses correctly rewrite or close handling of the standard channels.