TIP 504: New subcommand [string insert]

Login
Author:         Don Porter <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        21-Mar-2018
Obsoletes:	475
Post-History:   
Keywords:	Tcl,string,insert
Tcl-Version:	8.7
Tcl-Branch:     dgp-string-insert
Votes-For:      DKF, JN, DGP, FV, SL, AK
Votes-Against:  none
Votes-Present:  none

Abstract

This TIP proposes a [string insert] subcommand for inserting a substring at a given index. This new [string insert] command is to be the string analogue of [linsert].

History

[TIP 475] proposed the same subcommand, and in addition proposed a public C routine to access the same functionality. It was rejected, but only for reasons relating to the C routine. This TIP is a duplication of the rejected TIP, modified only to remove the C routine from the proposal, and with appropriate revisions to the sections describing the Reference Implementation and Future Work.

Rationale

Substring insertion is a basic string operation not directly available in current Tcl. Substring insertion can be synthesized from existing string commands, but the numerous legal forms of indexing yield significant difficulty. A novice user cannot be expected to know, much less implement, all possible index formats. Thus it is reasonable to provide a standard substring insertion command.

The current design of [string replace] expressly (albeit inexplicably) prevents its use for performing string insertion. TIP 323 originally proposed to extend [string replace] to allow string insertion, but this aspect of TIP 323 was withdrawn for the sake of compatibility.

Clarification: TIP 504 proposes no changes to the semantics of [string replace].

To mirror the behavior of [linsert], [string insert] at end should append to the string. This is in conflict with [string replace], were it to be extended to permit replacing an empty substring.

[lreplace] of an empty range at end inserts elements immediately before the final element, whereas appending requires the start index to be end+1. The hypothetical extended [string replace] would mirror [lreplace] and therefore would not have the end-relative indexing semantics needed to implement [string insert].

In conclusion, it is most straightforward to simply provide a [string insert] command with the same semantics as [linsert].

Current Behavior

Currently available methods for string insertion are awkward and do not handle end-relative and TIP 176-style indexing without extraordinary effort. To demonstrate the surprising degree of complexity, the following is a pure Tcl script reference implementation intended to handle all possible index formats and corner cases:

# Pure Tcl implementation of [string insert] command.
proc ::tcl::string::insert {string index insertString} {
    # Convert end-relative and TIP 176 indexes to simple integers.
    if {[regexp -expanded {
        ^(end(?![\t\n\v\f\r ])      # "end" is never followed by whitespace
        |[\t\n\v\f\r ]*[+-]?\d+)    # m, with optional leading whitespace
        (?:([+-])                   # op, omitted when index is "end"
        ([+-]?\d+))?                # n, omitted when index is "end"
        [\t\n\v\f\r ]*$             # optional whitespace (unless "end")
    } $index _ m op n]} {
        # Convert first index to an integer.
        switch $m {
            end     {set index [string length $string]}
            default {scan $m %d index}
        }

        # Add or subtract second index, if provided.
        switch $op {
            + {set index [expr {$index + $n}]}
            - {set index [expr {$index - $n}]}
        }
    } elseif {![string is integer -strict $index]} {
        # Reject invalid indexes.
        return -code error "bad index \"$index\": must be\
                integer?\[+-\]integer? or end?\[+-\]integer?"
    }

    # Concatenate the pre-insert, insertion, and post-insert strings.
    string cat [string range $string 0 [expr {$index - 1}]] $insertString\
               [string range $string $index end]
}

# Bind [string insert] to [::tcl::string::insert].
namespace ensemble configure string -map [dict replace\
        [namespace ensemble configure string -map]\
        insert ::tcl::string::insert]

More sample implementations can be found on the Additional String Functions page of the Tcler's Wiki, but at time of writing, they do not handle end-relative indexing nor can be used to append to a string. Since they are implemented in terms of [string replace] and do not perform any index arithmetic of their own, they actually do support TIP 176 indexes.

Compatibility Considerations

The existence of a command named [string insert] breaks any existing code that assumes [string in] is an unambiguous abbreviation for [string index]. Two options exist:

  1. Special-case [string in] to mean [string index].
  2. Take no special action, in which case [string in] becomes an error.

This TIP proposes option #2 because abbreviations are not guaranteed to be stable in the long term. This TIP targets Tcl 8.7, the first alpha version of which was recently released. Thus there are three reasons why compatibility is deemphasized in this situation.

Specification

Add a new [string insert] command:

string insert string index insertString

Returns a copy of string with insertString inserted at the index'th character. index may be specified as described in the STRING INDICES section.

If index is start-relative, the first character inserted in the returned string will be at the specified index. If index is end-relative, the last character inserted in the returned string will be at the specified index.

If index is at or before the start of string (e.g., index is 0), insertString is prepended to string. If index is at or after the end of string (e.g., index is end), insertString is appended to string.

Reference Implementation

A pure Tcl reference implementation is given above.

The dgp-string-insert branch in the Tcl Fossil repository provides an implementation of the proposed subcommand, complete with documentation, bytecode compilation, and a set of test cases.

Future Work

The direct evaluation of [string insert] is routed though a new internal routine TclStringReplace. It is a conventional substring replacer routine that serves as the inner core of both the [string insert] and [string replace] commands with suitable screening of corner cases by the callers. There is one routine to perform this function, so that there is one place to get the functionality right (debugging), one place to work on performance and representation efficiency (optimization), and one place where we can experiment with transformation to different data structures. This is one of a family of TclStringFoo routines that are engines of functionality for other [string foo] subcommands.

The existing internal routine TclStringReplace does not include the full collection of optimizations that the prior routine Tcl_ReplaceObj did. Since this routine remains internal, it can continue to gain these revisions without further TIP examination. Likewise, alternative bytecode compiler and execution strategies may also be pursued internally.

These routines may be good candidates to become available to applications and extensions in the public C API. The current TIP does not propose that. It is out of scope. A set of questions will need to be addressed when considering converting these routines into public ones. First it will need to be determined what level of robustness to present in a public interface. Should such a function be permitted to fail or even abort if given improper arguments? Or should all required argument validation be built into the routines to detect such things and raise catch-able errors instead? This general question includes the specific question about whether such routines are permitted to panic when memory allocation fails. Second several of these routines require an argument specifying an index into a string. These arguments have type int. It is expected that string indices limited to int will no longer be desirable in Tcl 9, so it must be decided whether it makes sense to create new routines for Tcl 8.7 that are destined to be discarded by Tcl 9, or whether a migration path needs to be put in place from the beginning. Since these questions are non-trivial, addressing them is saved for a later TIP.

Copyright

This document has been placed in the public domain.