Tcl Source Code

View Ticket
Login
Ticket UUID: 2960042
Title: high water mark memory not shared for file I/O needs
Type: Bug Version: obsolete: 8.5.8
Submitter: blacksqr Created on: 2010-02-27 08:24:47
Subsystem: 42. Memory Preservation Assigned To: hobbs
Priority: 5 Medium Severity:
Status: Open Last Modified: 2010-05-06 12:30:04
Resolution: None Closed By: dkf
    Closed on: 2010-02-27 17:45:37
Description:
When a string is converted to a list via the split command, the memory used by the list is never released, even if the list variable is unset, even if the conversion takes place in a procedure with no information returned.

To see the leak, execute the following procedure:

proc listleak {} {
set line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
for {set i 0} {$i < 2000000} {incr i} {
append largeString $line\n
}
set largeList [split $largeString \n]
unset largeString
unset largeList
return
}

===========================================================================

On my computer:

% exec vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0  18844 2134300  11572 433516    0    0    71    25  121  378  7  1 92  1
% listleak
% exec vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0  18844 1836432  11580 433516    0    0    71    25  121  378  7  1 92  1

A net loss of free memory of 297868 kB.  The memory never comes back until the Tcl interpreter is exited.

ActiveTcl 8.5.8 on Ubuntu Intrepid.
User Comments: blacksqr added on 2010-05-06 12:30:04:
Now that fixing this bug has been shot down as a GSoC project, and dkf has suggested elsewhere it might be a weekend project, is there any chance of getting this issue re-addressed?

ferrieux added on 2010-03-04 18:20:27:
Ah, address arithmetics, right.

But this gives me an idea to circumvent the "iterate over pages" portability issue.
Principle: use a private notion of "page" as an internal granularity inside our Tcl_Objs blocks. This granularity should be a power of two to allow for fast bitshift arithmetics. Then, each Tcl_Obj block is sprinkled with such "grain boundaries" (not necessarily in phase with the beginning of the block). At each position synchronized with this granularity, we reserve the Tcl_Obj slot (ie don't put it in free list) and instead, store the pointer to the current block structure in there.

Consequence: the housekeeping metadata memory cost is on average one out of  sizeof(OURPAGE)/sizeof(Tcl_Obj), which may go down to a few bits per object, without playing dirty bitfield tricks (which is impossible becauseTcl_Objs don't have a single spare bit).

Also, the block lookup time is roughly that of your OS-page-based scheme, very small (adress bitshift + 1 indirection).

This "modulo" scheme only implies a bit more work at block allocation time (to skip the metadata slots), and possibly extra care to deal with the fact that sizeof(Tcl_Obj) is not a power of 2 (it's 24 on 32-bit cpus).

dkf added on 2010-03-04 17:16:50:
Doing GC is... well, not a small project. Best make it out of scope for our own sanity. :-)

My original code (now rather bitrotted, even if I can find a copy which may be impossible) for doing this had a scheme for getting memory pages directly from the OS (every modern OS can do that) which I'd then put a header structure at the start of the page and fill the rest of the page with (initially unused) Tcl_Obj structures. I'd then do some evil bit bashing to go from the address of a Tcl_Obj to the address of the page containing it (using the obvious fact that the page is aligned on a page boundary) so that I could cheaply get the accounting structure. Overall, I'd throw away pages back to the OS (possible since I'd got them allocated directly instead of via malloc) when a page had no in-use objects on it; I'd keep around one spare page on deallocation so that there wouldn't be disastrous performance effects if there were allocations back and forth across a page boundary.

The reason why this strategy usually works? It depends on the fact that most apps have a group of objects that have long lifetimes and many others that have short lifetimes; even though there will be some pages locked down because of the long lifetimes, there will be many others that are able to be freed. Generational garbage collectors use the same set of basic underlying assumptions about the population of object lifetimes, and they seem to work really well in practice.

The main problem is just one of engineering. Getting it all to work really stably across many platforms is quite challenging (at the time I was working on this, I had the advantage of only needing it to work on 32-bit SPARC running Solaris).

ferrieux added on 2010-03-04 15:53:33:
Update: Donal nearly convinced me that the "statistics" (rare/frequent case) regarding free-able blocks are in favour of trying a block-deallocation scheme. "Nearly", because the problem is to do it efficiently. Several issues:

(1) since (as you're right to point out) the block-freeing should not wait for alloc failures (mainly because alien code like extensions could not benefit from reclaimable space), it should happen routinely and hence be reasonably fast. This implies keeping track of which block is freeable instead of doing a sweeping pass just in time. Keeping track implies, for each individual Tcl_Obj allocation/deallocation, knowing to which block the object belongs, which is not currently recorded, and would require extra space in the object (or a sweeping pass with address comparisons, which we've just rejected).

(2) if we go towards object relocation as you suggest, things get even worse. Indeed, since objects are reached by single indirection (not double, like old MacOS's relocatable "handles"), moving any of them implies finding references to it; short of adding a complicated back-reference system, this in turns means a mark-and-sweep pass. And even if we wild magic gives us the back references lying in container objects or vars, we still have to worry about the refs which are in the current C stack (eg in [foreach]), not to move the rug under our feet.

Tough.

blacksqr added on 2010-03-04 15:30:53:
After a little googling, it's not clear if any other leading dynamic language does much more in terms of memory management.  A stop-gap solution may be using a package like metakit for manipulating large data sets in tight memory circumstances.

blacksqr added on 2010-03-04 13:52:28:
I'm glad the clouds have parted a bit on what is going on here.  It is somewhat what I thought, but I'm still dismayed.

I would argue that any memory management scheme must have some sort of allocation lifecycle built in, where unused memory is at some point detected, garbage-collected and freed.  Otherwise your memory management method is operationally indistinguishable from a leak, as the present example shows.  Memory management is always tricky, but simply punting on garbage collection "within the universe of Tcl_Objs" seems to me like a disaster in the making.

The interpreter need not wait until a malloc fails before starting a decision process about freeing internal memory.  One might for example set a maximum high water mark of around 10 percent of RAM, after which Tcl_Obj memory is immediately freed after use.  Or one might design an object compactor which moves in-use Tcl_Objs from blocks of otherwise unused objects so the blocks can be freed.  Obviously I'm not familiar enough with Tcl's internals to give a wise and informed opinion, but as a user who wants to make expanded use of Tcl in the coming computer world where data sets and demands for efficiency will grow dramatically, this behavior seems a significant impediment.

ferrieux added on 2010-03-04 04:27:01:
Still not reproduced, but I think I know what's happening. 

Notice  that there are two fundamentally different kinds of allocations in the Tcl core: Tcl_Obj's and vanilla mallocs. Tcl_Objs are allocated by blocks (with large mallocs), but the block themselves are never free()ed; instead, freed Tcl_Objs are linked in a free list inside their hosting blocks. Vanilla mallocs are used for everything else, like strings or list storage. 

Now this "high water mark" handling of objects doesn't generate actual leaks _within_ the universe of Tcl_Objs, but it competes for available memory with the rest of vanilla mallocs. Hence, if A is a vanilla malloc and B a block of Tcl_Objs and A+B exceed total available heap space, then:
  (1) A and B cannot be simultaneously allocated (obvious)
  (2) alloc(A);free(A);alloc(B);free(B) works
  (3) alloc(B);free(B);alloc(A);free(A) doesn't, simply because free(B) is a no-op as far as the heap is concerned

So here A is large_file, B is hi_water_mark, and I'd predict that with proper sizes you can even observe the failure skipping the first call to large_file.

Hence, your new title is accurate in describing the situation, though we can generalize "I/O needs" to "vanilla mallocs".  However, it is not really a bug; rather it lays in the gray zone of implementation trade-offs.

Indeed, while it could be imagined to free whole blocks of free Tcl_Objs in certain conditions (like after a failure to malloc or realloc), in the general case there is no guarantee that a completely free block exists (a single non-free obj per block spoils the method). A typical scenario generating this pattern is when long-lived objects are created at any time, not just at the beginning.

To summarize, I'd say that a switch from "high-water-mark" to "non-monotonous" method for Tcl_Obj block allocation could improve *some* cases like the one you've shown, but would leave an overwhelming number of similar cases out of reach. 

Hence I'm tempted to freeze this as Won't Fix.
I'd welcome counter arguments, or different evaluations of the statistics.


a big vanilla malloc (like the big string grown by the  first call to [large_file]) can fit once in memory, then (after freeing it) leave space to an irreversible claim by a big block of Tcl_Objs

blacksqr added on 2010-03-03 07:29:09:
My previous description of the problem as a memory leak was invalid.  I've changed the title of the bug ticket to reflect that.  Please put that concept aside.

How much free memory is left on your computer after you run high_water_mark?  To see what I'm seeing, it should be less than the size of the file created by large_file (about 246MB).  Adjust the number of iterations in the for loop of high_water_mark until it runs to completion but leaves less free memory available than the size of the large file.  Then run large_file.  The interpreter should crash even though almost none of the available memory is being used.

ferrieux added on 2010-03-03 04:09:29:
I've tried various sizes, but never got this behavior. I'm calling alternatively large_file and high_water_mark in a loop, and depending on the values I get one of two outcomes:
          - the loop goes on forever, with [memory info] at a fixed point
          - Tcl_Panic in the 1st invocation of either function
None of these is compatible with a leak. Tough.

blacksqr added on 2010-03-03 02:28:29:
Sorry for the oversight, yes the "unable to alloc" message definitely appears in the terminal from which tclsh was started when the interpreter crashes.

P.S.  The crash doesn't come on the second invocation of high_water_mark, but on the second invocation of large_file, after the first invocation of high_water_mark.

ferrieux added on 2010-03-02 14:55:47:
Two things:

(1) Please please please confirm (of infirm) that when you get a crash on 2nd invocation of hi-water-mark, you also get the "unable to alloc" message on stderr, meaning Tcl_Panic(). This is important, and you keep failing to mention it.

(2) Of course, IF/WHEN I manage to get the same behavior (crash on 2nd invocation only), it will be definitely qualify as a bug. That has not happened yet. But I'm just after hard evidence, no need to go back to c.l.t seeking for support of a "minority theory". It will become the majority as soon as evidence comes.

blacksqr added on 2010-03-02 12:16:57:
P.S.  As I stated in the comments in the attached file, on my computer, the high_water_mark proc runs OK, then Tcl crashes on the second invocation of the large_file proc.  If high_water_mark crashes your interpreter, reduce the number of iterations in the for loop until it completes successfully.  Then increase the size of the file written in large_file if necessary, until it's bigger than the free memory left over after running high_water_mark.  Then you should see the behavior I'm seeing.

blacksqr added on 2010-03-02 12:08:19:
I'm still trying to learn about Tcl's expected behavior in these instances, so I'm not sure what you're asking me for, if anything.  My primary naive concern is the difference between expected (by me) behavior and what I see.  After the high_water_mark proc runs, the interpreter is reserving over two GB of memory for itself, almost completely unused as far as I can see.  Yet when the large_file proc is called, which requires only a small subset of that memory, Tcl doesn't utilize its reserved memory and complete execution of the proc, instead it tries to allocate even more memory and crashes as a result.  

This makes no sense to me.  This is the area where I'm trying to understand even if this is seen as a bug.  To me it sure seems to be, since my bottom line is the program crashes even though there's ample unused memory to complete the given tasks.

ferrieux added on 2010-03-01 17:57:35:
Confirmed, this is a pure oom condition, in my case when Tcl_Panic is called the process's VM size is 3G. Note that the limit is not the physical RAM, but rather the maximum heap size, which is bounded by addressable VM, which is 3G under 32-bit Linux (there's 1G of kernel-reserved address space).

Note also that the real memory consumer is [split] rather than [append], since here [split] doesn't take advantage of the repeated substring (no interning, as Donal told you on c.l.t), hence needs an individual malloc block (with its overhead) for each split-chunk.

Your call for counter-evidence now ;-)

ferrieux added on 2010-03-01 16:19:15:
OK, repro on HEAD after doubling the size of the "xxxx" in the hi-water-mark function.
Note that you don't need the large-file func, nor do you get any chance to call anything twice: Tcl not only crashes, but calls Tcl_Panic, and just before the core dump you should see on stderr:

 unable to alloc ... bytes

(with the actual value possibly depending on system setup).

I will dive a bit further, but this new light seems to indicate it is a dup of one of the various bugs related to [append] when reaching the size limit. We admittedly don't  fail gracefully in that case, and Tcl_Panic is often the best we can do.

So I'd say this has nothing to do with a leak or hi-water-mark methods, but simply an atomic, expected Tcl_Panic in low-memory condition.

If you have counter-evidence, like crashing only on the 2nd call (which I don't see), please provide it.

blacksqr added on 2010-03-01 12:19:25:
My previous understanding of my problem was inadequate, my apologies.  I have uploaded demo code which crashes my interpreter which, I hope, illustrates the problem more clearly.  The issue seems to be that when the Tcl interpreter keeps a claim on some memory used for a completed procedure as "high water mark" memory, that memory is not released or shared when it is needed for a large file I/O job.  Thus the interpreter crashes with a memory allocation error when it seems that there should be more than enough momory under Tcl's control to complete the job.

Additional comments are in the uploaded file.  It may be necessary to tweak some of the numbers in the script depending on the RAM available on the test machine.  I reported my computer's RAM and usage levels in the file.

blacksqr added on 2010-03-01 12:09:25:

File Added - 364921: high_water_mark.tcl

ferrieux added on 2010-03-01 05:06:52:
Stephen, to help your chase: don't forget the function's result, it may keep a reference to your locally generated data. Also scan  [info globals] after procedure end.

dkf added on 2010-02-28 00:45:36:
Could just be high-water-mark memory management. To check if it is really a leak, see if repeating the operation many times over increases the leak (when I try - with shorter strings - I see no change in consumption after the first run).

Attachments: