TIP 537: Enable 64-bit indexes in regexp matching

Login
Author:         Jan Nijtmans <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        7-April-2019
Post-History:   
Keywords:       Tcl
Tcl-Version:    9.0
Tcl-Branch:     regexp-api-64bit

Abstract

This TIP proposes to modify struct Tcl_RegExpInfo and struct Tcl_RegExpIndices, such that the fields indicating indexes change from type int to type Tcl_Size.

Rationale

This TIP should have been part of TIP #502 (Index Value Reform) and/or TIP #494 (More use of Tcl_Size in Tcl 9), but it was overlooked. Without changing this public API, regular expression indexes never can exceed 2GiB in value.

Specification and Documentation

Here are the new struct definitions:

typedef struct Tcl_RegExpInfo {
    Tcl_Size nsubs;
    Tcl_RegExpIndices *matches;
    Tcl_Size extendStart;
} Tcl_RegExpInfo;

typedef struct Tcl_RegExpIndices {
    Tcl_Size start;
    Tcl_Size end;
} Tcl_RegExpIndices;

Also a new macro TCL_INDEX_NONE will be provided, which is the value of the start and end fields when there is no match. This macro will be provided to 8.7 as well, but in Tcl 8.7 it will have the value (-1).

Addendum

After TIP #660 was accepted, a lot of functions changed from using size_t to ptrdiff_t parameters. In order to prevent confusion, this change has been adapted in the TIP text above as well.

Implementation

An implementation of this TIP is present in the regexp-api-64bit branch.

Copyright

This document has been placed in the public domain.