TIP 537: Enable 64-bit indexes in regexp matching

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to tclconference@googlegroups.com
or submit via the online form by Sep 9.
Author:         Jan Nijtmans <jan.nijtmans@gmail.com>
State:          Draft
Type:           Project
Vote:           
Created:        7-April-2019
Post-History:   
Keywords:       Tcl
Tcl-Version:    9.0
Tcl-Branch:     regexp-api-64bit

Abstract

This TIP proposes to modify "struct Tcl_RegExpInfo" and "struct Tcl_RegExpIndices, such that the fields indicating indexes change from type "int" to type "size_t"

Rationale

This TIP should have been part of TIP #502 (Index Value Reform) and/or TIP #494 (More use of size_t in Tcl 9), but it was overlooked. Without changing this public API, regular expression indexes never can exceed 2G in value.

Specification and Documentation

Here are the new struct definitions:

typedef struct Tcl_RegExpInfo {
    size_t nsubs;
    Tcl_RegExpIndices *matches;
    size_t extendStart;
} Tcl_RegExpInfo;

typedef struct Tcl_RegExpIndices { size_t start; size_t end; } Tcl_RegExpIndices;

Also a new macro TCL_INDEX_NONE will be provided, which is the value of the "start"/"end" fields when there is no match. This macro will be provided to 8.7 as well, but in Tcl 8.7 it will have the value (-1).

Implementation

An implementation of this TIP is present in the regexp-api-64bit branch.

Copyright

This document has been placed in the public domain.