Tcl Source Code

View Ticket
Login
Ticket UUID: d7ea9f9853e8438db20f38c27bd657ecb2d48438
Title: makesearch() can create a useless state
Type: Bug Version: 8.6.4
Submitter: tgl Created on: 2015-09-09 22:39:26
Subsystem: 43. Regexp Assigned To: dgp
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2015-09-21 18:52:12
Resolution: Accepted Closed By: dgp
    Closed on: 2015-09-21 18:52:12
Description:
I noticed that makesearch() contains a minor bug that often causes it to clone a successor of the "pre" state twice not once.  AFAICT the extra state is unreachable and thus there is no visible effect, but it's still a waste of resources.

The problem occurs when the first state seen by the "make a list of the states" search is seen twice because there are two arcs from the "pre" state to the same state.  The first time, we do

s->tmp = slist;
slist = s;

but since slist was previously NULL, s->tmp doesn't change.  The second time, the test for s->tmp == NULL mistakenly thinks we hadn't visited this state, so it does that again.  Now we have slist pointing to a state whose tmp field also points to itself, so we'll visit it twice in the "do the splits" loop.  The second visit makes a second clone of this state, but since we already freed all the state's inarcs from places other than pre, the second clone receives no inarcs and is useless.

The problem can be seen to exist by adding a dumpnfa() call after the call to makesearch().
On a test case like regexp {b*b} {a}, this printout results:


final cleanup:
pre 1, post 0, bos [2], bol [3], eos [4], eol [5]
0@	no out arcs
1>	[0]->3	[1]->3	[2]->3	[3]->3
2.	[0]->0	[1]->0	[4]->0	[5]->0
3.	[1]->3	[1]->2
max 5
# 1( 1): b
# 2(ps): 
# 3(ps): 
# 4(ps): 
# 5(ps): 

after makesearch:
pre 1, post 0, bos [2], bol [3], eos [4], eol [5]
0@	no out arcs
1>	[0]->3	[1]->3	[2]->3	[3]->3	[0]->1
	[1]->1	[2]->1	[3]->1
2.	[0]->0	[1]->0	[4]->0	[5]->0
3.	[1]->2	[1]->4
4.	[1]->2	[1]->4
5.	[1]->4	[1]->2
max 5
# 1( 1): b
# 2(ps): 
# 3(ps): 
# 4(ps): 
# 5(ps): 

Notice the lack of ways to reach state 5.

I think the fix is simple: the test

if (b != NULL && s->tmp == NULL)

needs to be

if (b != NULL && s->tmp == NULL && s != slist)
User Comments: dgp added on 2015-09-21 18:52:12:
Patch accepted for 8.5.19 and 8.6.5

dgp added on 2015-09-21 17:15:26:
Branch tgl-pg-re opened with this patch.

tgl added on 2015-09-10 00:12:17:
Oh, never mind that last: I found dgp's commit log entry of 2003-11-17.  So the previous problem was that overwriting the tmp field could cause earlier-seen states to get forgotten and not split at all, which *would* have an observable effect by breaking the execution-time no-progress detection logic.  That's not true for splitting twice, though.

Also, I did consider the case of cross-links between these states, but AFAICT that does work fine, with or without this patch.

tgl added on 2015-09-10 00:02:10:
btw, there was evidently an earlier attempt to fix this list-wrangling logic, which surprises me because I can't see how there would be a visible effect even if we did clone states multiple times.  The referenced bug numbers are unrelated to any regression test cases.  Is there any info still on-line about what the observed symptoms were?

tgl added on 2015-09-09 23:09:42:
ah, scratch that suggestion: fails for more than one such state.  Attaching a patch I've tested a bit more.

Attachments: