Tcl Source Code

View Ticket
Login
Ticket UUID: 700ac47ece488c90836eac2e459ad29e6f329b6e
Title: Bug in regexp matching
Type: Bug Version: 8.5.17
Submitter: Hagay Garty Created on: 2014-12-27 09:17:56
Subsystem: 43. Regexp Assigned To: nobody
Priority: 7 High Severity: Severe
Status: Closed Last Modified: 2014-12-31 15:05:05
Resolution: Invalid Closed By: ferrieux
    Closed on: 2014-12-31 15:05:05
Description:
Hi, I have seen a bug in regexp matching. I verified VS external regexp engines that indeed im not mistaken.

Recreation:

set data {0000}
regexp {\n?(0+?)} $data a b

This will place '0000' in b variable 
The correct result is to place '0' in b variable

So in this case the '+?' is treated as '+"

BTW when omitting the '\n?' from the beginning of the regexp the problem is gone and the correct assignment is made to b - '0'
User Comments: ferrieux added on 2014-12-31 15:05:05:
Agreeing with the analysis, closing accordingly.

ysch added on 2014-12-29 21:43:14:
> Hi, I have seen a bug in regexp matching. 
No, you didn't. ;)

> I verified VS external regexp engines that indeed im not mistaken.
Other engines are irrelevant here.

From man re_syntax:
---
In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, its choice is determined by its preference: either the longest substring, or the shortest.

Most atoms, and all constraints, have no preference. A parenthesized RE has the same preference (possibly none) as the RE. A quantified atom with quantifier {m} or {m}? has the same preference (possibly none) as the atom itself. A quantified atom with other normal quantifiers (including {m,n} with m equal to n) prefers longest match. A quantified atom with other non-greedy quantifiers (including {m,n}? with m equal to n) prefers shortest match. A branch has the same preference as the first quantified atom in it which has a preference. 
---
Your RE starts with quantified atom with normal quantifier (\n?), so this atom prefers longest match. As your RE consists of single branch, "it has the same preference as the first quantified atom in it which has a preference" --- longest match.

So, it's not a bug. So, your RE should have been: \n??(0+?)