Tcl Source Code

View Ticket
Login
Ticket UUID: fe4fca32d988f67704a9fec3372bf340c81c3ce
Title: mixed-greediness regular expressions need better documentation
Type: Bug Version: 8.6
Submitter: glennj Created on: 2015-03-24 14:55:43
Subsystem: - New Builtin Commands Assigned To: nobody
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2015-03-24 14:55:43
Resolution: None Closed By: nobody
    Closed on:
Description:
Currently, the MATCHING section of the re_syntax man page talks about "preferences", and states:

> A branch has the same preference as the first quantified atom in it which has a preference.

I don't this this is sufficiently clear to illustrate the differences between

    $ perl -e 'if ("1234" =~ /(\d+?)(\d+)/) {print "$& $1 $2\n"}'
    1234 1 234
    $ echo 'puts [regexp -inline {(\d+?)(\d+)} "1234"]' | tclsh
    12 1 2

Can we have an explicit statement? Something like:

> If the first quantifier in a branch of a RE is non-greedy, /all/ quantifiers in the branch will be considered as non-greedy.

However, I just noticed something I can't explain: what's the difference between these?

    $ echo 'puts [regexp -inline {(?:(\d+?)(\d+))} "1234"]' | tclsh
    12 1 2
    $ echo 'puts [regexp -inline {s+?|(?:(\d+?)(\d+))} "1234"]' | tclsh
    1234 1 234