Tcl Source Code

View Ticket
Login
Ticket UUID: 2942697
Title: Pathologically slow regexp execution
Type: Bug Version: obsolete: 8.5.7
Submitter: tgl Created on: 2010-01-30 06:46:24
Subsystem: 43. Regexp Assigned To: dkf
Priority: 7 High Severity:
Status: Closed Last Modified: 2010-02-01 18:20:28
Resolution: Fixed Closed By: dkf
    Closed on: 2010-02-01 11:20:28
Description:
The attached test case "slowregexp.tcl" demonstrates a case that results in O(N^2) slowdown in execution of a regexp match, as a result of nested searches in crevdissect: the outer invocation searches for a suitable midpoint, and for each possible midpoint the inner execution checks all possible locations for its midpoint.
I tested this against 8.5.7 but it doesn't look like the relevant files have changed since then.

There are a number of ways that this might be attacked, but I attach a fairly simple patch that seems to improve matters for me.  The idea is that to verify a midpoint requires a longest() call as well as recursively cdissect()ing the left and right sides --- but we could do the longest() check first since that's the fastest.  This reduces the runtime of the given test case about 500X on my machine.  There are two possible flaws in this solution:

1. It won't work if longest() depends on cdissect() having set up some state --- for instance, establishing backref string boundaries.  As far as I can see, that isn't the case, but I'm no expert on this code so I might've missed something.  I can say that the Tcl regression tests still pass, though.

2. It's conceivable that there are cases where this way is slower rather than faster.  I can't answer that at all.  The regression test cases seem to take just about the same amount of time as before, though.

FYI, this problem was originally submitted to the Postgres project at http://archives.postgresql.org/pgsql-hackers/2010-01/msg02864.php
User Comments: dkf added on 2010-02-01 18:20:28:

allow_comments - 1

Backported to 8.4

dkf added on 2010-02-01 18:12:25:
Backported to 8.5

dkf added on 2010-02-01 07:40:47:
As far as I can see this works, and I can't see any problems with it. (The speedup is nearly 800 times on the sample case as it happens: 94.3s -> 120ms)

Applied to HEAD. Backports still pending.

tgl added on 2010-01-30 13:47:28:

File Added - 360857: regexp.patch

tgl added on 2010-01-30 13:46:25:

File Added - 360856: slowregexp.tcl

Attachments: