Ticket UUID: | 496456 | |||
Title: | regexp engine takes too much RAM | |||
Type: | Bug | Version: | obsolete: 8.3 | |
Submitter: | kofeinik | Created on: | 2001-12-24 11:18:10 | |
Subsystem: | 43. Regexp | Assigned To: | dkf | |
Priority: | 3 Low | Severity: | ||
Status: | Closed | Last Modified: | 2002-01-08 23:43:36 | |
Resolution: | Wont Fix | Closed By: | dkf | |
Closed on: | 2002-01-08 16:43:36 | |||
Description: |
TCL version: 8.3 Platform: windows regexp engine takes too much RAM after executing the code from an example (see attached file) the total size of allocated RAM stays on the level of 24 megabytes that too much (IMHO) for this small array. | |||
User Comments: |
dkf added on 2002-01-08 23:43:36:
Logged In: YES user_id=79902 Further examination indicates that the memory taken is approximately proportional to the maximum length of the regexps, and it is not a memory leak since deleting the BlacklistRules array recovers the memory for general use (returning it to the OS depends on the system memory allocator; UNIX ones mostly don't.) Closing; it's almost certainly beyond the ability of all current Tcl maintainers to fix this without breaking something and there are simple workarounds (keeping the RE size down and not using capturing parens seem to have the best effect, though the former increases the time to search all the REs.) dkf added on 2002-01-02 19:00:57: Logged In: YES user_id=79902 It seems that the RE compiler allocates some fairly large objects during its processing (during the coloring of the graph) and in a complex RE these may occupy quite a lot of memory; maybe we need to write a custom allocator for these things? The alternative is to figure out a way to use smaller color arrays, but I don't understand the code anything like well enough to be able to do that at this point. :^( The details can be seen in regcustom.h and regguts.h if you're interested. (My guess is that the problems really stem from the way that support for characters wider than a byte was added, but that's just a guess at this point.) kofeinik added on 2001-12-27 23:01:41: Logged In: YES user_id=286083 With concatenating of RE's i want to reduce time, needed for matching, because TCL compile RE only first time, and next matching works much faster. And i agree, this situation is typical tradeoff, but i can't agree with proportion 23KB->10-17MB. I'm not right? dkf added on 2001-12-27 22:36:04: Logged In: YES user_id=79902 Experiment indicates that replacing ( with (?: shaves around 5MB off the size of the image. Don't know how much memory the RE engine needs to allocate for all that stuff, though I suspect that you don't want to use a single RE for this; the initial compile-time gets a bit humungous. Matching each RE line separately keeps the space required *much* smaller, but makes the length of time to check an unlisted site longer after the initial compilation (which is also much shorter.) Looks like a time-space tradeoff... dkf added on 2001-12-27 19:47:47: Logged In: YES user_id=79902 The problem is probably related to the quantity of capturing parentheses involved. There may be some linkage to the use of arrays (which are *truly* not appropriate for this use; lists are much more like what is needed) and the retention of all the regexps used in building the main one too. Inded, I could rewrite the code to build the regexp quite quickly (though I've no idea how large the single resulting RE would actually be, or whether there would be any performance gain through doing this after the initial loading phase.) There is possibly a need for a flag to say that no capturing at all should be done by a regexp, (making "RE(RE)RE" equivalent to "RE(?:RE)RE") which might be useful in situations such as this. If that's really the case, then this is a FRQ, not a bug per se. I'll have a look at this in detail after the holidays. msofer added on 2001-12-24 21:41:13: Logged In: YES user_id=148712 Bug replicated on linux with tcl8.3.3: VSZ is 17MB in the original version, only 3MB when taking the [string length] version. kofeinik added on 2001-12-24 18:18:16: File Added - 15088: test.zip |
Attachments:
- test.zip [download] added by kofeinik on 2001-12-24 18:18:16. [details]