Author:         Andreas Kupries <[email protected]>
Author:         Andreas Kupries <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        09-Sep-2004
Post-History:   
Tcl-Version:    8.5
Tcl-Ticket:     1025294

Abstract

This document describes an API which reflects the Channel Driver API of the core I/O system up into the Tcl level, for the implementation of channel types in Tcl. It is built on top of [208] ('Add a chan command') and also an independent companion to [230] ('Tcl Channel Transformation Reflection API') and [228] ('Tcl Filesystem Reflection API'). As the later TIPs bring the ability of writing channel transformations and filesystems in Tcl itself into the core so this TIP provides the facilities for the implementation of new channel types in Tcl. This document specifies version 1 of the channel reflection API.

Motivation / Rationale

The purpose of this and the other reflection TIPs is to provide all the facilities required for the creation and usage of wrapped files (= virtual filesystems attached to executables and binary libraries) within the core.

While it is possible to implement and place all the proposed reflectivity in separate and external packages, this however means that the core itself cannot make use of wrapping technology and virtual filesystems to encapsulate and attach its own data and library files to itself. This is something which is desirable as it can make the deployment and embedding of the core easier, due to having less files to deal with, and a higher degree of self-containment.

One possible application of a completely self-contained core library would be, for example, the Tcl browser plugin.

While it is also possible to create a special purpose filesystem and channel driver in the core for this type of thing, it is however my belief that the general purpose framework specified here is a better solution as it will also give users of the core the freedom to experiment with their own ideas, instead of constraining them to what we managed to envision.

Another use for reflected channels was found when creating the reference implementation: As helper for testing the generic I/O system of Tcl, by creating channels which forcibly return errors, bogus data, and the like.

Specification

Introduction

This specification has to address two questions to make the reflection work.

How are the driver functions reflected into the Tcl level?
How are file events generated in the Tcl level communicated back to the C level? This includes routing to the correct channel.

C Level API

Four functions are added to the public C API. See section "Error Handling" for their detailed specification.

Tcl Level API

The Tcl Level API consists of two new subcommands added to the ensemble command chan specified by [208]. The new subcommands are:

chan create mode cmdprefix

This subcommand creates a new script level channel using the command prefix cmdprefix as its handler. The cmdprefix has to be a list. The API this handler has to provide is specified below, in the section "Command Handler API". The handle of the new channel is returned as the result of the command, and the channel is open. Use the regular close command to remove the channel.

The argument mode specifies if the channel is opened for reading, writing, or both. It is a list containing any of the strings read or write. The list has to have at least one element, as a channel you can neither write to nor read from makes no sense. The handler command for the new channel has to support the chosen mode. An error is thrown if that is not the case.

We have chosen to use late binding of the handler command. See the section "Early versus Late Binding of the Handler Command" for more detailed explanations.
chan postevent channel eventspec

This subcommand is for use by command handlers, it notifies the channel represented by channel that the event(s) listed in the eventspec have occurred. The argument eventspec is a list containing any of read and write. At least one element is required (It does not make sense to invoke the command if there are no events to post).

Note that this subcommand can be used only on channel handles which were created/opened with the subcommand create. Application to channels like files, sockets, etc. is not possible and will cause the generation of an error.

As only the Tcl level of a channel, i.e. its command handler, should post events to it we also restrict the usage of this command to the interpreter the handler command is in. In other words, posting events to a reflected channel from a different interpreter than its implementation is in is not allowed.

Another restriction is that it is not possible to post events the I/O core has not registered interest in. Trying to do so will cause the method to throw an error. See the method watch in section "Command Handler API" as well.

Command Handler API

The Tcl-level handler command for a reflected channel is an ensemble that has to support the following subcommands, as listed below. Note that the term ensemble is used to generically describe all command (prefixes) which are able to process subcommands. This TIP is not tied to the recently introduced 'namespace ensemble's.

Of the available methods the handler has to support initialize, finalize, and watch, always. The other methods are optional.

handlerCmd initialize channel mode

This is the first call the command handler will receive for the given new channel. It is his responsibility to set up any internal data structures it needs to keep track of the channel and its state.

The return value of the method has to be a list containing the names of all methods which are supported by this handler. This implicitly tells the C level the version of the API used by the command handler making a separate version number redundant. Hence our decision to leave such a number out of the API. Any changes to the API will be either the elimination of methods, or the introduction of new ones. An existing method cannot change its signature (arguments, and result), a new method has to be introduced for this. All of this implies that this method, initialize, is unchangeable after the TIP has been committed, as it is the entry point through which the C level will determine the API version before it knows anything else.

Any error thrown by the method will abort the creation of the channel and no channel will be created. The thrown error will appear as error thrown by chan create.

Any exception beyond error, like break, etc. is treated as and converted to an error.

Important - If the creation of the channel was aborted due to failures in initialize then the method finalize will not be called.

This method has no equivalent at the C level.

It was considered to return only the list of optional methods supported by the handler. The chosen method however should make the code in the C layer more regular. Another advantage of this is that it allows the C level to better check if the API it expects is matching the API provided by the handler.

The argument mode tells the handler if the channel was opened for reading, writing, or both. It is a list containing any of the strings read or write. The C-level doing the call will never generate abbreviations of these strings. The list will always contain at least one element, as a channel you can neither write to nor read from makes no sense.

The method has to throw an error if the chosen mode is not supported by the handler command.
handlerCmd finalize channel

The method is called when the channel was closed, and is the last call a handler can receive for the given channel. This happens just before the destruction of the C level data structures. Still, the command handler must not access the channel anymore in no way. It is now his responsibility to clean up any internal resources it allocated to this channel.

The return value of the method is ignored.

If the method throws an error the command which caused its invocation (usually close) will appear to have thrown this error.

Any exception beyond error, like break, etc. is treated as and converted to an error.

The equivalent C-level function is Tcl_DriverCloseProc.

This method is not invoked if the creation of the channel was aborted during initialize.
handlerCmd read channel count

This method is optional. It is called when the user requests data from a channel. count specifies how many bytes have been requested. If the method is not supported then it is not possible to read from the channel handled by the command.

The return value of the method is taken as the requested data. If the returned data contains more bytes than requested an error will be signaled and later thrown by the command which performed the read (usually gets or read). Returning less bytes than requested is acceptable however.

If the method throws an error the command which caused its invocation (usually gets, or read) will appear to have thrown this error.

Any exception beyond error, like break, etc. is treated as and converted to an error.

The equivalent C-level function is Tcl_DriverInputProc.
handlerCmd write channel data

This method is optional. It is called when the user writes data to the channel. Note that the data are bytes, not characters (The underlying Tcl_ObjType is ByteArray). Any type of transformation (EOL, encoding) configured for the channel has already been applied at this point. If the method is not supported then it is not possible to write to the channel handled by the command.

The return value of the method is taken as the number of bytes written by the channel. Anything non-numeric will cause an error to be signaled and later thrown by the command which performed the write. A negative value implies that the write failed. Returning a value greater than the number of bytes given to the handler, or zero, is forbidden and will cause the C level to throw errors.

If the method throws an error the command which caused its invocation (usually puts) will appear to have thrown this error.

Any exception beyond error, like break, etc. is treated as and converted to an error.

The equivalent C-level function is Tcl_DriverOutputProc.
handlerCmd seek channel offset base

This method is optional. It is responsible for the handling of seek and tell requests on the channel. If it is not supported then seeking will not be possible for the channel.

base is one of

* start - Seeking is relative to the beginning of the channel.

* current - Seeking is relative to the current seek position.

* end - Seeking is relative to the end of the channel.

The base argument of the builtin seek command takes the same names.

The offset is an integer number specifying the amount of bytes to seek forward or backward. A positive number will seek forward, and a negative number will seek backward.

A channel may provide only limited seeking. For example sockets can seek forward, but not backward.

The return value of the method is taken as the (new) location of the channel, counted from the start. This has to be an integer number greater than or equal to zero.

If the method throws an error the command which caused its invocation (usually seek) will appear to have thrown this error.

Any exception beyond error, like break, etc. is treated as and converted to an error.

The offset/base combination of 0/"current" signals a tell request, i.e. seek nothing relative to the current location, making the new location identical to the current one, which is then returned.

The equivalent C-level functions are Tcl_DriverSeekProc, and Tcl_DriverWideSeekProc (where possible).
handlerCmd configure channel option value

This method is optional. It is for writing the type specific options.

Per call one option has to be written.

The return value of the method is ignored.

If the method throws an error the command which performed the (re)configuration or query (usually fconfigure) will appear to have thrown this error.

Any exception beyond error, like break, etc. is treated as and converted to an error.

The equivalent C-level function is Tcl_DriverSetOptionProc.
handlerCmd cget channel option

This method is optional. It is used when reading a single type specific option. If this method is supported then the method cgetall has to be supported as well.

The call has to return the value of the specified option.

If the method throws an error the command which performed the (re)configuration or query (usually fconfigure) will appear to have thrown this error.

The equivalent C-level function is Tcl_DriverGetOptionProc.
handlerCmd cgetall channel

This method is optional. It is used for reading all type specific options. If this method is supported then the method cget has to be supported as well.

It has to return a list of all options and their values. This list has to have an even number of elements.

If the method throws an error the command which performed the (re)configuration or query (usually fconfigure) will appear to have thrown this error.

Any exception beyond error, like break, etc. is treated as and converted to an error.

The equivalent C-level function is Tcl_DriverGetOptionProc.
handlerCmd watch channel eventspec

This methods notifies the Tcl level that the specified channel is interesting in the events listed in the eventspec. This is a list containing any of read and write. The C-level doing the call will never generate abbreviations of these strings. The empty list is allowed as well and signals that the channel does not wish to be notified of any events. In other words, it has to disable event generation at the Tcl level.

Any return value of the method is ignored. This includes errors thrown by the method, break, continue, and custom return codes.

The equivalent C-level function is Tcl_DriverWatchProc.

This method interacts with chan postevent. Trying to post an event not listed in the last call to this method will cause an error.
handlerCmd blocking channel mode

This method is optional. It handles changes to the blocking mode of the channel. The mode is a boolean flag. True means that the channel has to be set to blocking. False means that the channel should be non-blocking.

The return value of the method is ignored.

If the method throws an error the command which caused its invocation (usually fconfigure) will appear to have thrown this error.

Any exception beyond error, like break, etc. is treated as and converted to an error.

The equivalent C-level function is Tcl_DriverBlockModeProc.

Notes:

The function Tcl_DriverGetHandleProc is not supported. There is no equivalent handler method at the Tcl level.
The function Tcl_DriverHandlerProc is not supported. There is no equivalent handler method at the Tcl level. The function has no relevance to base channels, which we work with here, only for channel transformations. See [230] ('Tcl Channel Transformation Reflection API') for more information on the issue.
The function Tcl_DriverFlushProc is not supported. The reason for this: The current generic I/O layer of Tcl does not use this function at all, nowhere. Therefore support at the Tcl level makes no sense either. We can always extend the API defined here (and change its version number) should the function be used at some time in the future.

Error handling

The current I/O core's ability to handle arbitrary Tcl error messages is very limited. Tcl_DriverGetOptionProc and Tcl_DriverSetOptionProc are the only driver functions for which this is possible directly. Everywhere else the API is restricted to returning POSIX error codes.

This limitation makes the debugging of problems in a channel command handler at least very difficult. As such it is considered not acceptable. It is proposed to solve this problem through the addition of four new functions to Tcl's public stub table.

void Tcl_SetChannelError(Tcl_Channel chan, Tcl_Obj* msg)

void Tcl_SetChannelErrorInterp(Tcl_Interp* ip, Tcl_Obj* msg)

These functions store error information in a channel or interpreter. Previously stored information will be discarded. They have to be used by channel drivers wishing to pass regular Tcl error information to the generic layer of the I/O core.

The refCount of msg is unchanged when the functions had to rewrite msg per the safety precautions explained below, as a properly modified copy of msg is stored, and not msg itself. Otherwise the refCount of msg is incremented by one.

void Tcl_GetChannelError(Tcl_Channel chan, Tcl_Obj** msg)

void Tcl_GetChannelErrorInterp(Tcl_Interp* ip, Tcl_Obj** msg)

These function retrieve error information stored in a channel or interpreter O, and also resets O to have no information stored in it. They will return NULL if no information was stored to begin with.

i.e. After an invocation of Tcl_GetChannelError* for a channel/interpreter object O, all following invocations will return NULL for that object, until an intervening invocation of Tcl_SetChannelError* again stored information in O.

The msg argument is not allowed to be NULL. Nor are the chan and ip arguments.

The refCount of the returned information is not touched. The reference previously held by the channel or interpreter is now held by the caller of the function and it is its responsibility to release that reference when it is done with the object.

This solution is not very elegant, but anything else will require an incompatible redefinition of the whole channel driver structure and of the driver functions.

It should also be noted that usage of Tcl_Objects for the information storage binds the information to a single thread. I.e. a transfer across thread boundaries is not possible. This however is not required here and thus no limitation.

The four functions have been made public as I can imagine that even C level drivers might wish to use this facility to generate more explicit and readable error messages than is provided through POSIX error codes and the errno API.

The information talked about in the API specifications above is not a plain string, but has to be a list of uneven length. The last element will be interpreted as the actual error message in question, and the preceding elements are considered as option/value pairs containing additional information about the error, like the errorCode, etc. I.e. they are an extensible dictionary containing the details of the error beyond the basic message.

As a safety precaution any -level specification submitted by the driver and a non-zero value will be rewritten to a value of 0 to prevent the driver from being able to force the user application into the execution of arbitrary multi-level returns, i.e. from arbitrarily changing the control-flow of the application itself. Analogously any -code specification with a non-zero value which is not error is rewritten to value 1 (i.e. error).

Below a list of driver functions, and which of the _Tcl_SetChannelError*** functions they are allowed to use.

Tcl_DriverCloseProc

May use Tcl_SetChannelErrorInterp, and only this function.
Tcl_DriverInputProc

May use Tcl_SetChannelError, and only this function.
Tcl_DriverOutputProc

May use Tcl_SetChannelError, and only this function.
Tcl_DriverSeekProc, and Tcl_DriverWideSeekProc

May use Tcl_SetChannelError, and only this function.
Tcl_DriverSetOptionProc

Has already the ability to pass arbitrary error messages. Must not use any of the new functions.
Tcl_DriverGetOptionProc

Has already the ability to pass arbitrary error messages. Must not use any of the new functions.
Tcl_DriverWatchProc

Must not use any of the new functions. Is internally called and has no ability to return any type of error whatsoever.
Tcl_DriverBlockModeProc

May use Tcl_SetChannelError, and only this function.
Tcl_DriverGetHandleProc

Must not use any of the new functions. It is only a low-level function, and not used by Tcl commands.
Tcl_DriverHandlerProc

Must not use any of the new functions. Is internally called and has no ability to return any type of error whatsoever.

Given the information above the following public functions of the Tcl C API are affected by these changes. I.e. when these functions are called the channel may now contain a stored arbitrary error message requiring processing by the caller.

Tcl_StackChannel
Tcl_Seek
Tcl_Tell
Tcl_ReadRaw
Tcl_Read
Tcl_ReadChars
Tcl_Gets
Tcl_GetsObj
Tcl_Flush
Tcl_WriteRaw
Tcl_WriteObj
Tcl_Write
Tcl_WriteChars

All other API functions are unchanged. Especially the functions below leave all their error information in the interpreter result.

Tcl_Close
Tcl_UnregisterChannel
Tcl_UnstackChannel

A previous revision of this TIP specified only two functions, storing the data only in channels. This however proved to be inadequate. It allows the transfer of messages for most driver functions, but not close. Storing an error message in the channel structure which is destroyed is not helpful. So we need the functions for storing data in interpreters. Conversely, providing only two functions storing the information in an interpreter, is inadequate as well. The circumstances for that to happen are actually very limited, but they can happen. First, most driver functions are not given an interpreter reference when called, and actually do not know which interpreter caused their invocation. The only remedy we have is that the channel structure has to have an interpreter reference to the interpreter of the command handler, for the calls into the Tcl level. This could be used in most circumstances, except when threads are enabled and the channel was transfered out of the thread containing that interpreter. We are not allowed to use this interpreter from the channel thread, and again have no other reference available. So for this the code/message pair has to be stored in a channel as the sole place available.

A previous revision of this TIP not only stored an error message, but also a result code in the channel or interpreter, and used it as the return code of the Tcl command which invoked the driver function returning the exception. This feature has been discarded as a possible security hazard. It would allow a malicious Tcl driver to cause break and continue exceptions at arbitrary locations in the overall application, controlling its behaviour as it sees fit.

I wish to thank Joe English and Vince Darley for their input with regard to the limitations of error propagation in the I/O core and possible ideas for solving it. Joe's discourse on the problems with the use of POSIX error codes in an earlier revision of this TIP made me realize that I should not use them anywhere in the API for reflected channels and rather concentrate on extending the I/O system to properly receive Tcl error messages. And while I rejected the TclSetPosixError function Vince proposed I hopefully kept the spirit of that proposal in my solution as well. The main reason against setting an arbitrary posix error string was that it invented another way of passing error information around, whereas the specification above is based on the existing Tcl_InterpState and attendant functionality.

Interaction with Threads and Other Interpreters.

A channel created with the chan create command knows the interpreter it was created in and executes its handler command only in that interpreter, even if the channel is shared with and/or has been moved into a different interpreter. This is easy to accomplish, by evaluating the handler command only in the context of the original interpreter.

The channel also knows the thread it was created in and executes its handler command only in that thread, even if the channel has been moved into a different thread. This is not so easy to accomplish, but still possible and feasible. It is done by:

Detecting if a driver function is called from a different thread, and
Forwarding the invocation of the handler script to the original thread via specialized events. This means that an event loop has to be active in the original thread, able to process these events.

Note that this also allows the creation of a channel whose two endpoints live in two different threads and provide a stream-oriented bridge between these threads. In other words we can provide a way for regular stream communication between threads instead of having to send commands.

When a thread or interpreter is deleted all channels created with the chan create command using this thread/interpreter as their computing base will be deleted as well, in all interpreters they have been shared with or moved into, and in whatever thread they have been moved to. This pulls the rug out under the other thread(s) and/or interpreter(s), this however cannot be avoided. Trying to use such a channel will cause the generation of the regular error about unknown channel handles.

Interaction with Safe Interpreters

The new subcommands create and postevent of chan are safe and therefore made accessible to safe interpreters.

While create arranges for the execution of code this code is always executed within the safe interpreter, even if the channel was moved (See previous section).

The subcommand postevent can trigger the execution of fileevent handlers, however if they are executed in trusted interpreters then they were registered by these interpreters as well. (Moving channels between threads strips fileevent handlers, and just between interpreters keeps them, and executes them where they were added).

Early versus Late Binding of the Handler Command

We have two principal methods for using the handler command. These are called early and late binding.

Early binding means that the command implementation to use is determined at the time of the creation of the channel, i.e. when chan create is executed, before any methods are called. Afterward it cannot change. The result of the command resolution is stored internally and used until the channel is destroyed. Renaming the handler command has no effect. In other words, the system will automatically call the command under the new name. The destruction of the handler command is intercepted and causes the channel to close as well.

Late binding means that the handler command is stored internally essentially as a string, and this string is mapped to the implementation to use for each and every call to a method of the handler. Renaming the command, or destroying it means that the next call of a handler method will fail, causing the higher level channel command to fail as well. Depending on the method the error message may not be able to explain the reason of that failure.

Another problem with this approach is that the context for the resolution of the command name has to be specified explicitly to avoid problems with relative names. Early binding resolves once, in the context of the chan create call. Late binding performs resolution anywhere where channel commands like puts, gets, etc. are called, i.e. in a random context. To prevent problems with different commands of the same name in several namespaces it becomes necessary to force the usage of a specific fixed context for the resolution. The only context suitable for such is the global context (per uplevel #0, not namespace eval ::).

Note that moving a different command into place after renaming the original handler allows the Tcl level to change the implementation dynamically at runtime. This however is not really an advantage over early binding as the early bound command can be written such that it delegates to the actual implementation, and that can then be changed dynamically as well.

However, despite all this late binding is so far the method of choice for the implementation of callbacks, be they in Tcl, or Tk; and has been chosen for the reflection as well.

Miscellanea

The channel reflection API reserves the driver type "tclrchannel" for itself. Usage of this driver type by other channel types is not allowed.

Examples

Driver Implementations

A simple way of implementing new types of channels is to use any of the various object systems for Tcl. Create a class for the channel type. Create the new channel in the constructor for new objects and store the channel handle. Make the new object the command handler for the channel. This automatically translates the sub commands for the command handler into object methods. Implement the various methods required. when the object is deleted close the channel, and delete the object when the channel announces that it has been closed. This part is a bit tricky, flags have to be used to break the potential cycle.

Another possibility is to implement the command handler as a regular command, together with a creation command wrapping around chan create and a backend which keeps track of all handles created by it and their state, associated data, etc.

 object based example ...

  snit::type new_channel {
      constructor {mode args} {
          # Handle args ...
          set chan [chan create $mode $self]
      }
      destructor {
          # ... delete internal state ...
          if {$dead} return
          set dead 1
          close $chan
      }

      method handle {} {return $chan}
      variable chan
      variable dead 0

      method finalize {dummy} {
          if {$dead} return
          set dead 1
          $self destroy
      }
      method initialize {dummy mode} {}
      method read       {dummy count} {}
      method write      {dummy data} {}
      method seek       {dummy offset base} {}
      method configure  {dummy args} {}
      method watch      {dummy events} {}
      method blocking   {dummy isblocking} {}
  }

  proc newchannel_open {args} {
      return [[new_channel %AUTO% {expand}$args] handle]
  }

Other Possible Drivers

Memory channel based on a string. Block and/or FIFO oriented.
Null device. Writable, not writable. WOM device. Data sink.
Random data (Writing to it may re-seed the PRNG).
Zero channel. Readable, returns a stream of binary 0s. Not writable.
FIFO channel between different threads.
Optimized virtual filesystem implementations.

Current VFS implementations have to use the package memchan to provide the channels when a file in them is opened, which necessitates that for all open files all of their data is in memory, possibly even more than once (when several channels are open on the same file). A reflected driver however allows implementations which keep only part of the data in memory. Or nearly none at all if the VFS provides computed information / is based on some data structure.

A more concrete example would be a driver which provides access to files stored in some archive file. Using a reflect driver the archive file can be memory mapped and the driver will then read whatever data is needed when requested. Currently it will have to copy the data into a memchan channel, i.e duplicate it in memory.

Note that of course the internals of the archive file may limit the amount of memory savings we can achieve. If for example the file we wish to access is stored in a compressed form we will have to decompress it in memory at least to the highest location requested so far. And any write operation (if allowed) will have to keep the data in memory until it has been compressed and committed.

Reference Implementation

A reference implementation is provided at SourceForge http://sourceforge.net/support/tracker.php?aid=1025294 .

Comments

[ Add comments on the document here ]

Copyright

This document has been placed in the public domain.

TIP 219: Tcl Channel Reflection API