Tcl Source Code

View Ticket
Login
Ticket UUID: 3106532
Title: "switch -regexp -indexvar" gives invalid range
Type: Bug Version: obsolete: 8.5.9
Submitter: martinlemburg Created on: 2010-11-10 08:27:35
Subsystem: 18. Commands M-Z Assigned To: dkf
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2012-05-17 23:45:52
Resolution: Fixed Closed By: dkf
    Closed on: 2012-05-17 16:45:52
Description:
In TIP #75 is told, that the new -indexvar option, related to the -regexp option of switch, should behave like in the -indices option in the regexp command to return the range of the found matche and the ranges of the sub matches for any sub expression.
The man page points to "regexp -indices", too, but tells the index variable contains range(s) from the first matching character to the next after the last matching character.

So - TIP #75 is not realized and the behavior of "switch -regexp -indexvar" is not comparable to "regexp -indices".

An example:

    % switch -regexp -indexvar i -matchvar m "abcdef" {
        ^abc {
            puts "matchvar = '$m'";
            puts "indexvar = [list $i]";
            puts "string range = '[string range "abcdef" {*}[lindex $i 0]]'";
        }
    }
    matchvar = 'abc'
    indexvar = {{0 3}}
    string range = 'abcd'
    % string range "abcdef" {*}[lindex [regexp -indices -inline {^abc} "abcdef"] 0]
    abc

This behavior should be consistent and be corrected for the switch option -indexvar!
User Comments: dkf added on 2012-05-17 23:45:52:

allow_comments - 1

Sure, it could cause incompatibilities to fix a bug but it was nonetheless a bug. Now it's a fixed bug. :-) Note from the ChangeLog:

***POTENTIAL INCOMPATIBILITY***
Uses of [switch -regexp -indexvar] that previously compensated for the
wrong offsets (by subtracting 1 from the end indices) now do not need
to do so as the value is correct.

martinlemburg added on 2010-11-11 22:35:58:
What is more important ... the described, but wrong implemented intention (regexp -indices behavior) ... or a man page describing correctly the wrong behavior - which it should or must, because otherwise the man page would be "buggy"!

No - no behavior is right, only because its documented, but only if the way to the behavior is well documented and the final behavior matches the intentions, which may change on that way!

Here the intentions were clear from the start, but are not met at the end!

And - the specification justifies the behavior not the man page, which only describes!

So even a bug fix could cause incompabilities!

dkf added on 2010-11-11 21:36:34:
The intent was clearly to mirror [regexp -indices]. That it does not is a bug.

avl42 added on 2010-11-11 21:34:33:
The current docu is in itself contradictory by doing both: stating the (non-TIP75-conformant) behaviour AND referring to regexp as being alike.

Since this current behaviour is not only at odds with regexp, but also with everything in tcl that involves ranges, it should be seen as a bug that involves both implementation and docu, especially as keeping it is likely to create confusion and do more harm than changing this relatively new feature.

ferrieux added on 2010-11-11 20:55:45:
At the end of the day, what count as specification is the manpage, not the unwritten intention behind the TIP. Here the manpage says: 

                    ... will be a two-element list specifying  the  index  of
                 the  start  and index of the first character after the end of
                 the overall substring of the input string

Hence, any application working today and using the -indexvar option, with its current and documented semantics, will sunddenly fail if the change is applied. That is an API change, hence that needs a TIP.

martinlemburg added on 2010-11-11 19:57:09:
IMHO wrong implemented or not implemented as specified software is buggy or the specification must be changed.

And there should really no need to TIP a change to let "switch -regexp -indexvar" behave like specified!

ferrieux added on 2010-11-11 00:00:04:
Sure, the only problem is that the code is consistent with the documentation. Bug locked in :(
Ready to TIP ?

martinlemburg added on 2010-11-10 23:46:33:
In tclCmdMZ.c:
line 276+277 (Tcl_RegexpObjCmd):
match = Tcl_RegExpExecObj(interp, regExpr, objPtr, offset, numMatchesSaved, eflags);

line 3740+3741 (Tcl_SwitchObjCmd):
int matched = Tcl_RegExpExecObj(interp, regExpr, stringObj, 0, numMatchesSaved, 0);

The first call to Tcl_RegExpExecObj has an "offset", that later on in ...

line 341-343:
    if (end >= offset) {
end--;
    }
 
... is used to correct the "end" index to point to the last character of the match.

This is missing in Tcl_SwitchObjCmd!

Since my time is limited I was not yet able to think about the offset ... :(

martinlemburg added on 2010-11-10 15:28:13:
This bug is related to 8.6beta, too!