Ticket UUID: | 1452969 | |||
Title: | performance: regexp 20sec delay from 8.0 to 8.3 | |||
Type: | Bug | Version: | obsolete: 8.4.13 | |
Submitter: | bstellar | Created on: | 2006-03-18 04:51:54 | |
Subsystem: | 43. Regexp | Assigned To: | hobbs | |
Priority: | 5 Medium | Severity: | ||
Status: | Open | Last Modified: | 2007-12-04 09:17:38 | |
Resolution: | None | Closed By: | bstellar | |
Closed on: | 2006-03-18 19:38:17 | |||
Description: |
Hi I am having considerable performance issue, from v8.0 to v8.3 on regexp for the following statement in my program. I have atttached the program, which responses in same time for 8.0 version as below: Before if - The time is Fri Mar 17 9:17:12 PM US Mountain Standard Time 2006 Inside the if The time is Fri Mar 17 9:17:12 PM US Mountain Standard Time 2006 lsWzrdEcnRequestedByCompleteList_choices and 20 seconds for 8.3: Before if - The time is Fri Mar 17 9:18:53 PM US Mountain Standard Time 2006 Inside the if The time is Fri Mar 17 9:19:22 PM US Mountain Standard Time 2006 lsWzrdEcnRequestedByCompleteList_choices Please help me, if it is known issue. Thanks, Balaji | |||
User Comments: |
hobbs added on 2007-12-04 09:17:38:
Logged In: YES user_id=72656 Originator: NO Oops, never mind the last - it is the RE only, I was actually basing it on the existence of regexp -all, which wasn't intro'd until 8.3, but the RE is bad for 8.2 as well. hobbs added on 2007-12-04 09:08:11: Logged In: YES user_id=72656 Originator: NO I just found that this actually was introduced in the 8.3 timeframe, and wasn't an issue in 8.2: 000 VERSIONS: 1:8.5b3 2:8.4.17 3:8.3.5 4:8.2.3 5:8.0.5 025 RE extract ini file 76083.10 75661.17 75340.00 152.00 312.00 026 RE extract ini file ng 275.60 301.23 277.00 159.00 -=- nobody added on 2006-03-20 16:21:38: Logged In: NO Another solution is just to use the ? non greedy quamtifier: regexp {^(.*?)=(.*)$} ... Nice and quick. Also probably what the user intended (in the case where there is a second '='). dkf added on 2006-03-20 06:04:31: Logged In: YES user_id=79902 The RE engine is pretty close to a black box to me; I know how to fix a very small subset of the problems it has, and this is definitely not one of them! Possible consideration: special-case in the outer [regexp] code to detect and handle the case where we have: ^(.*)<SOME_LITERAL>(.*)$ as that's replacable by a simple substring search instead of the much more complex backtracking stuff that the RE engine does. But I've no time to implement this anyway. hobbs added on 2006-03-20 03:14:37: Logged In: YES user_id=72656 The capturing appears to be the real killer. Remove that the slowdown is small. The match is still accurate, so I don't know what extra bits the RE is doing when capturing. msofer added on 2006-03-19 02:51:35: Logged In: YES user_id=148712 bstellar's problem is solved, but the performance bug is still there ... bstellar added on 2006-03-19 02:33:20: Logged In: YES user_id=1479322 Excellent, this help me very much... msofer added on 2006-03-18 20:50:30: Logged In: YES user_id=148712 Oops: [string last], not [string first]. Still much faster: % time { if {[set n [string last = $sEnvPair]] != -1} { set sEnvName [string range $sEnvPair 0 [expr {$n-1}]] set sEnvValue [string range $sEnvPair [incr n] end] } } 1000 1040.019 microseconds per iteration Just by looking at your snippet of code, I'm guessing that you want [string first], and that your regexp is wrong as it will produce the wrong results if anybody has an '=' in his name. msofer added on 2006-03-18 20:35:13: Logged In: YES user_id=148712 The performance confirmed bad on modern Tcl (8.3 is already quite old, if you are migrating you should use 8.4). Since 8.0 the regexp engine has been completely replaced; the new unicode awareness does make it slower. However, for your particular case, the use of regexp is probably not necessary: a combination of [string last] and [string range] is much more efficient. Look: % time { if {[set n [string first = $sEnvPair]] != -1} { set sEnvName [string range $sEnvPair 0 [expr {$n-1}]] set sEnvValue [string range $sEnvPair [incr n] end] } } 1000 168.87 microseconds per iteration % time { if {[regexp {^(.*)=(.*)$} $sEnvPair sDummy sEnvName sEnvValue] == 1} { } } 22609833 microseconds per iteration BTW: priority 9 is for release-blocking bugs; this is nowhere near qualifying. bstellar added on 2006-03-18 11:55:15: Logged In: YES user_id=1479322 This regexp creates delay, please help; if { [ regexp {^(.*)=(.*)$} $sEnvPair sDummy sEnvName sEnvValue ] == 1 } { bstellar added on 2006-03-18 11:51:55: File Added - 171362: jdReproduceIssue.tcl |
Attachments:
- jdReproduceIssue.tcl [download] added by bstellar on 2006-03-18 11:51:54. [details]