Tcl Source Code

View Ticket
Login
Ticket UUID: 2798939
Title: Non-greedy quantifiers revert to greedy when branched
Type: Bug Version: obsolete: 8.5.7
Submitter: bilgexa Created on: 2009-05-30 21:15:37
Subsystem: 43. Regexp Assigned To: aku
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2017-10-26 02:10:01
Resolution: None Closed By: nobody
    Closed on:
Description:
Non-greedy quantifiers revert to greedy when used in conjunction with branches (pipe character).

The following branch-free expression matches each letter in the test string and therefore correctly returns 14
regexp -all .+? {this is a test}

The following branched expression matches the entire string only once because the quantifier reverts to greedy and therefore incorrectly returns 1
regexp -all .+?|x {this is a test}

This breaks a lot of complex regular expressions that require non-greedy quantifiers and branches.
User Comments: dgp added on 2017-10-26 02:10:01:
Tcl and postgres make use of (forks of)
the same regexp engine.

dram added on 2017-10-26 01:36:45:
Still true for 8.6.6

But this behaviour is stated clearly in re_syntax(n) as:

  An RE consisting of two or more branches connected by the | operator prefers longest match.

BTW, PostgreSQL has same behaviour. (confirmed in PG 9.6.4)

bilgexa added on 2009-05-31 18:47:22:
Surely you realise that this "fix" cannot possibly apply to complex alternations. The non-greedy quantifiers within the alternation group are all broken and this functionality can never be replaced by a single quantifier on the group as a whole.

dkf added on 2009-05-31 16:55:25:
IIRC, the problem is that the "first" quantifier in the syntax tree is the alternation, which can't be made non-greedy. The fix is to use {1,1}? to force the whole RE to be non-greedy (a non-capturing paren is also needed, but that's pretty trivial).

% regexp -all (?:.+?|x){1,1}? {this is a test}
14

Note that this is different from Perl/PCRE; they use a different model for greedy/non-greedy handling that does this in a way that is more as people intuitively expect (in return for being worse in other cases).