Ticket UUID: | 2798939 | |||
Title: | Non-greedy quantifiers revert to greedy when branched | |||
Type: | Bug | Version: | obsolete: 8.5.7 | |
Submitter: | bilgexa | Created on: | 2009-05-30 21:15:37 | |
Subsystem: | 43. Regexp | Assigned To: | aku | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Open | Last Modified: | 2017-10-26 02:10:01 | |
Resolution: | None | Closed By: | nobody | |
Closed on: | ||||
Description: |
Non-greedy quantifiers revert to greedy when used in conjunction with branches (pipe character). The following branch-free expression matches each letter in the test string and therefore correctly returns 14 regexp -all .+? {this is a test} The following branched expression matches the entire string only once because the quantifier reverts to greedy and therefore incorrectly returns 1 regexp -all .+?|x {this is a test} This breaks a lot of complex regular expressions that require non-greedy quantifiers and branches. | |||
User Comments: |
dgp added on 2017-10-26 02:10:01:
Tcl and postgres make use of (forks of) the same regexp engine. dram added on 2017-10-26 01:36:45: Still true for 8.6.6 But this behaviour is stated clearly in re_syntax(n) as: An RE consisting of two or more branches connected by the | operator prefers longest match. BTW, PostgreSQL has same behaviour. (confirmed in PG 9.6.4) bilgexa added on 2009-05-31 18:47:22: Surely you realise that this "fix" cannot possibly apply to complex alternations. The non-greedy quantifiers within the alternation group are all broken and this functionality can never be replaced by a single quantifier on the group as a whole. dkf added on 2009-05-31 16:55:25: IIRC, the problem is that the "first" quantifier in the syntax tree is the alternation, which can't be made non-greedy. The fix is to use {1,1}? to force the whole RE to be non-greedy (a non-capturing paren is also needed, but that's pretty trivial). % regexp -all (?:.+?|x){1,1}? {this is a test} 14 Note that this is different from Perl/PCRE; they use a different model for greedy/non-greedy handling that does this in a way that is more as people intuitively expect (in return for being worse in other cases). |