Ticket UUID: | 219219 | |||
Title: | greedy vs. non-greedy confusion | |||
Type: | Bug | Version: | final: 8.2.3 | |
Submitter: | nobody | Created on: | 2000-10-26 05:03:45 | |
Subsystem: | 43. Regexp | Assigned To: | aku | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Open | Last Modified: | 2017-10-26 01:40:45 | |
Resolution: | None | Closed By: | nobody | |
Closed on: | ||||
Description: |
OriginalBugID: 4001 Bug Version: 8.2.3 SubmitDate: '2000-01-10' LastModified: '2000-01-27' Severity: MED Status: Assigned Submitter: techsupp ChangedBy: hobbs RelatedBugIDs: 2866 OS: BSD OSVersion: NetBSD-1.4.1/i386 FixedDate: '2000-10-25' ClosedDate: '2000-10-25' Name: hume smith ReproducibleScript: > tclsh % regexp {x.*?([a-z]+)} {1234x56789word101112} a b 1 % set a x56789w % set b w % set tcl_patchLevel 8.2.3 % ObservedBehavior: the + was matched in a nongreedy fashion; shouldn't it be greedy? DesiredBehavior: % set a x56789word % set b word % Patch: PatchFiles: Henry Spencer has noted some problems with mixing greedy and non-greedy quantifiers in the new regexp code. He's cc'ed on this, but in the meantime, the work-around is: regexp {x[^a-z]*([a-z]+)} {1234x56789word101112} a b -- 01/10/2000 hobbs From HS: This is the same old problem: people accustomed to Perl are not grasping the idea that the whole RE is greedy or non-greedy, but *not* some mixture of the two. In this case, it is non-greedy since the first thing in it which cares is non-greedy. The + is being as greedy as it can, within the constraints set by the behavior of the whole RE. In short, the behavior, while surprising, is as documented. It's not an outright bug; it may, however, be a misfeature. -- 01/10/2000 hobbs | |||
User Comments: |
dram added on 2017-10-26 01:40:45:
Also encoutered this problem recently, with a simpler case: % regexp -inline {(.+?), (.+)} "foo, bar" {foo, b} foo b But this is expected behaviour as stated by re_syntax(n), and PostgreSQL have a more throughly description[1]. [1] https://www.postgresql.org/docs/10/static/functions-matching.html#functions-posix-regexp (in 9.7.3.5. Regular Expression Matching Rules) segeth added on 2006-10-05 21:29:16: Logged In: YES user_id=1613941 The problem greedy mixing non-greedy still exists in tcl v8.4. . In my opinion mixing those two one's should be possible, actually i have to split the RE ... one for non-greedy and one for greedy. It isn't explained in the manual re_syntax, that you can't create a mixture of greedy and non-greedy quantifiers. I like to know is it a documentation bug or a bug in the tcl interpreter? nobody added on 2001-11-20 05:45:44: Logged In: NO I couldnt find anything to confirm this behavior in 'man re_syntax(n)'. Where has it been documented into semi-legitimacy? Any correlation between familiarity with perl and noticing this behavior is purely coincidental. I mean, come on... If greedyness is meant to be a 'whole expression' behavior, why isnt it implemented as a switch, like with the '-nocase' option? Calling it a misfeature is being too kind, especially considering the amount of grief it causes the people who are affected by it. ;') dkf added on 2000-11-10 17:48:16: Perhaps this requires a documentation change? Well, either that or a behaviour change. Would it be possible to have a flag to force greediness or non-greediness instead of guessing it from the first quantifier in the regular expression? (With the default being greedy unless all the "top-level" quantifiers were non-greedy, maybe?) |