Tcl Source Code

View Ticket
Login
Ticket UUID: 1452969
Title: performance: regexp 20sec delay from 8.0 to 8.3
Type: Bug Version: obsolete: 8.4.13
Submitter: bstellar Created on: 2006-03-18 04:51:54
Subsystem: 43. Regexp Assigned To: hobbs
Priority: 5 Medium Severity:
Status: Open Last Modified: 2007-12-04 09:17:38
Resolution: None Closed By: bstellar
    Closed on: 2006-03-18 19:38:17
Description:
Hi I am having considerable performance issue, from 
v8.0 to v8.3 on regexp for the following statement in 
my program.

I have atttached the program, which responses in same 
time for 8.0 version as below:
Before if - The time is Fri Mar 17 9:17:12 PM US 
Mountain Standard Time 2006
Inside the if The time is Fri Mar 17 9:17:12 PM US 
Mountain Standard Time 2006
lsWzrdEcnRequestedByCompleteList_choices

and 20 seconds for 8.3:
Before if - The time is Fri Mar 17 9:18:53 PM US 
Mountain Standard Time 2006
Inside the if The time is Fri Mar 17 9:19:22 PM US 
Mountain Standard Time 2006
lsWzrdEcnRequestedByCompleteList_choices

Please help me, if it is known issue.

Thanks,
Balaji
User Comments: hobbs added on 2007-12-04 09:17:38:
Logged In: YES 
user_id=72656
Originator: NO

Oops, never mind the last - it is the RE only, I was actually basing it on the existence of regexp -all, which wasn't intro'd until 8.3, but the RE is bad for 8.2 as well.

hobbs added on 2007-12-04 09:08:11:
Logged In: YES 
user_id=72656
Originator: NO

I just found that this actually was introduced in the 8.3 timeframe, and wasn't an issue in 8.2:

000 VERSIONS:                  1:8.5b3 2:8.4.17  3:8.3.5  4:8.2.3  5:8.0.5
025 RE extract ini file       76083.10 75661.17 75340.00   152.00   312.00
026 RE extract ini file ng      275.60   301.23   277.00   159.00     -=-

nobody added on 2006-03-20 16:21:38:
Logged In: NO 

Another solution is just to use the ? non greedy quamtifier:

regexp {^(.*?)=(.*)$} ...

Nice and quick. Also probably what the user intended (in 
the case where there is a second '=').

dkf added on 2006-03-20 06:04:31:
Logged In: YES 
user_id=79902

The RE engine is pretty close to a black box to me; I know
how to fix a very small subset of the problems it has, and
this is definitely not one of them!

Possible consideration: special-case in the outer [regexp]
code to detect and handle the case where we have:
  ^(.*)<SOME_LITERAL>(.*)$
as that's replacable by a simple substring search instead of
the much more complex backtracking stuff that the RE engine
does. But I've no time to implement this anyway.

hobbs added on 2006-03-20 03:14:37:
Logged In: YES 
user_id=72656

The capturing appears to be the real killer.  Remove that
the slowdown is small.  The match is still accurate, so I
don't know what extra bits the RE is doing when capturing.

msofer added on 2006-03-19 02:51:35:
Logged In: YES 
user_id=148712

bstellar's problem is solved, but the performance bug is
still there ...

bstellar added on 2006-03-19 02:33:20:
Logged In: YES 
user_id=1479322

Excellent, this help me very much...

msofer added on 2006-03-18 20:50:30:
Logged In: YES 
user_id=148712

Oops: [string last], not [string first]. Still much faster:
% time {
    if {[set n [string last = $sEnvPair]] != -1} {
        set sEnvName [string range $sEnvPair 0 [expr {$n-1}]]
        set sEnvValue [string range $sEnvPair [incr n] end]
    }
} 1000
1040.019 microseconds per iteration

Just by looking at your snippet of code, I'm guessing that
you want [string first], and that your regexp is wrong as it
will produce the wrong results if anybody has an '=' in his
name.

msofer added on 2006-03-18 20:35:13:
Logged In: YES 
user_id=148712

The performance confirmed bad on modern Tcl (8.3 is already
quite old, if you are migrating you should use 8.4).

Since 8.0 the regexp engine has been completely replaced;
the new unicode awareness does make it slower.

However, for your particular case, the use of regexp is
probably not necessary: a combination of [string last] and
[string range] is much more efficient. Look:

% time {
    if {[set n [string first = $sEnvPair]] != -1} {
        set sEnvName [string range $sEnvPair 0 [expr {$n-1}]]
        set sEnvValue [string range $sEnvPair [incr n] end]
    }
} 1000
168.87 microseconds per iteration
% time {
    if {[regexp {^(.*)=(.*)$} $sEnvPair sDummy sEnvName
sEnvValue] == 1} {
    }
}
22609833 microseconds per iteration

BTW: priority 9 is for release-blocking bugs; this is
nowhere near qualifying.

bstellar added on 2006-03-18 11:55:15:
Logged In: YES 
user_id=1479322

This regexp creates delay, please help;
if { [ regexp {^(.*)=(.*)$} $sEnvPair sDummy 
sEnvName sEnvValue ] == 1 } {

bstellar added on 2006-03-18 11:51:55:

File Added - 171362: jdReproduceIssue.tcl

Attachments: