TIP 425: Correct use of UTF-8 in Panic Callback (Windows only)

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Jan Nijtmans <jan.nijtmans@gmail.com>
State:          Draft
Type:           Project
Vote:           Pending
Created:        17-Jul-2013
Post-History:   
Keywords:       Tcl,platform integration,i18n
Tcl-Version:    8.7

Abstract

The default panic proc on Windows console applications writes the message in UTF-8 to stderr. Unfortunately, the Windows console normally does not have UTF-8 as code page but some single-byte code page like CP1252. When using characters outside the ASCII range, that does not give the expected output in the console. This TIP proposes to add a new Console panic proc to the stub library, and modify the Tcl_Main() macro to use it.

Rationale

Many parts of Tcl use Unicode since Tcl 8.6: The command line handling, and the communication with all Win32 API functions. But the Panic proc has - so far - not been modified accordingly for Windows console applications, even though win32 has a suitable API to do so.

On Windows, there actually are two different panic procs, one for GUI applications and one for console applications, but external embedders don't have an API for deciding which one should be used other than provide their own. This TIP can finally do that: The call Tcl_SetPanicProc(Tcl_ConsolePanic) will initialize the Tcl subsystem for Console applications, while Tcl_SetPanicProc(NULL) will continue to use the default.

Making things worse, stderr is implemented by the C runtime, (msvcrt??.dll) but if a application is embedding or dynamically loading tcl.dll then the runtime of the embedder might be different from tcl.dll/tclsh.exe's runtime. The embedder providing the panic proc gives the highest chance that panic messages arrive in the same runtime as the embedder. For tclsh.exe this makes no difference.

Starting with VS2015, Microsoft introduced a new Universal CRT (UCRT), which defaults to the standard channels stdin/stdout/stderr to be linked in statically. This makes it possible to have different dll's (e.g. zlib.dll / tcl87.dll) occupy a different C runtime (such as msvcrt.dll), but still prevent conflicts in the standard channels. That's why it is best to put Tcl_ConsolePanic() in Tcl's stub library, which assures the function to be linked in statically into the tclsh executable, in stead of dynamically in tcl87.dll.

Proposed Change

A new function Tcl_ConsolePanic is added to the stub library on Windows, which can be installed by embedding application as panic proc. The full signature is:

EXTERN void Tcl_ConsolePanic( const char *format ...);

On other platforms than Windows, Tcl_ConsolePanic is a macro equivalent to NULL, on those platforms Tcl_SetPanicProc(Tcl_ConsolePanic) has the effect of resetting the panic proc to the platform's default.

This function is meant to be used for Win32 console applications, and can deliver the message in 3 possible ways

  • If a (Windows) debugger is running, the message is sent there.

  • If stderr is connected to a Windows console, the message is sent there (Windows only).

  • Otherwise, the UTF-8 BOM (3 bytes) is written followed by the unmodified message (assumed to be in UTF-8).

The function Tcl_ConsolePanic does not assume any locale, does not allocate memory, neither does it make any assumptions on the initialized state of Tcl. This makes Tcl_Panic work fine even in the final stage of a Tcl_Finalize() call. If a Win32 Unicode API is available for the desired output, Tcl_ConsolePanic will do at most an UTF-8 to Unicode conversion using the Win32 function MultiByteToWideChar().

The maximum number of (unicode) characters that is written is 26000, as that is the maximum that WriteConsoleW() can handle in a single call. See: https://connect.microsoft.com/VisualStudio/feedback/details/635230 If the message is longer than that, the string is truncated and three dots appended to it. If the message is sent to a character device, the UTF-8 BOM is prepended.

The function is available from the stub library, in order to bring the responsibility for correct linking to the embedding application, in stead of Tcl. In case of tclsh.exe, this makes no difference.

Reference Implementation

A reference implementation is available in the win-console-panic branch. https://core.tcl.tk/tcl/timeline?r=win-console-panic

Copyright

This document has been placed in the public domain.

History