TIP 502: Index Value Reform

Login
Author:         Don Porter <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        26-Feb-2018
Post-History:   
Tcl-Version:	8.7
Tcl-Branch:     tip-502
Vote-Results:   5/0/0 accepted
Votes-For:      DKF, JN, JD, KBK, SL
Votes-Against:  none
Votes-Present:  none

Abstract

Proposes reformed handling of Tcl index values.

Background

Many Tcl programmers may be surprised by this result

% lindex {a b} -4294967295
b

This is the consequence of two features of incumbent Tcl functionality. First, passing the value -4294967295 to Tcl_GetInt(FromObj) results in a successful extraction of the C int value 1. Generally any supported formatting of any value between -UINT_MAX and UINT_MAX is accepted, but transformed into a value between INT_MIN and INT_MAX. Second, the internal routine/macro TclGetIntForIndex(M) implements a definition of Tcl index values where any value acceptable to Tcl_GetInt(FromObj) is also acceptable as an index of the same value. Both of these features are of questionable merit, but the first has a fair bit of compatibility constraint attached to it because it involves the behavior of public routines. The second should be more open to potential revisions.

Another source of possible surprise is demonstrated:

% lindex {a b} 4294967295
% lindex {a b} 4294967295+1
a
% lindex {a b} 4294967296
bad index "4294967296": must be integer?[+-]integer? or end?[+-]integer?

These examples illustrate overflow effects in index arithmetic, and a rejection of apparently valid integer values that are larger than UINT_MAX. Neither of these effects is in agreement with handling of integers by expr which makes the results surprising.

Proposal

Revise the parsing of values used as index values to accept all integer representations acceptable to expr as either complete index values or as operands in the end-offset or index arithmetic formats.

Although this suggests an unlimited range of valid index values, in practice all index values will be truncated internally to the signed wide integer range, LLONG_MIN to LLONG_MAX, and in the Tcl 8 series of releases, further truncated to the signed int range, INT_MIN to INT_MAX, since no container can be larger than that range in Tcl 8.

Compatibility

Examples like the ones in the Background above will incompatibly change from one successful outcome to a different successful outcome. This is a true incompatibility, but it is difficult to believe anyone actually desires the outlier behaviors illustrated above, much less has code that relies on them.

Implementation

Drafted on the tip-502 branch.

Note that the core routine of the implementation, GetWideForIndex, produces a Tcl_WideInt result from a successful parse. I expect this functionality will be useful in the Tcl 9 codebase as an enabling step for containers with sizes larger than INT_MAX. The existing internal routine TclGetIntForIndex continues to produce the more limited range suitable for Tcl 8.

Copyright

This document has been placed in the public domain.