Tcl Source Code

View Ticket
Login
Ticket UUID: f97d4ee02082971e4b86a853c18c093fd0b2a45d
Title: quadratic cost of namespace teardown
Type: Bug Version: 8.6.4
Submitter: msofer Created on: 2015-03-22 20:02:10
Subsystem: 21. [namespace] Assigned To: dkf
Priority: 8 Severity: Important
Status: Closed Last Modified: 2016-05-21 09:28:58
Resolution: Fixed Closed By: dkf
    Closed on: 2016-05-21 09:28:58
Description:
Reported by emiliano in the chat: the following script shows quadratic (?) behaviour in namespace deletion.

foreach n {1 10 100 1000 10000 100000} {
    set i 0
    puts "--- $n ---"
    puts [time {namespace eval test::ns#$i {}; incr i} $n]
    puts [time {namespace delete test}]
}
User Comments: dkf added on 2016-05-21 09:28:58:

Tests were only failing in mem-debug mode (the problem was writing to freed memory).

Fixed and merged.


dgp added on 2016-05-20 11:53:25:
I see new tests, but I don't see them fail.

dgp added on 2016-05-17 14:48:02:
Patch is effective and appears correctly coded.

I say merge.

adrianmedranocalvo (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2016-05-17 09:47:27:

The patch fixes [e76f0a6ca52a2ff3c2b1e9cb2585f3de0c44ab5d], which was about namespaces with very many commands. Thank you!


dkf added on 2016-05-16 10:59:53:

Please review commit [20c01f4161d1b69d]. Passes test suite for me and does not have quadratic behaviour with motivating test case.


dkf added on 2015-08-02 16:52:36:

The real issue is that Tcl_FirstHashEntry is being used repeatedly, since that does a linear scan from the beginning of the list of buckets. At the very least, the cleanup code should avoid that. It's not the responsibility of the hash table code to make people use its API correctly.


dkf added on 2015-08-02 16:24:04:

Updated the title: this is a namespace problem, not a hash table problem.


dkf added on 2015-07-24 13:58:55:

For an example of why you can get into trouble with callbacks, try this script:

oo::class create Foo {
    variable i
    constructor x {set i [incr x]}
    destructor {Foo create ::abc::$i $i}
}
namespace eval abc {Foo create 1 1}
namespace delete abc
Just don't expect it to return soon!

We ought to do something to stop commands (or namespaces, or sub-namespaces) from being created in namespaces that are being deleted. At least then we'd get a sensible failure.


dkf added on 2015-07-24 13:50:04:

The bucket scan wouldn't matter, except that it's being repeated. It only needs to be repeated if any entry other than the current one was structurally changed. (Well, changes after the current entry are fine too. But that's happenstance.)

I wonder what happens if we put a flag in to detect whether a concurrent non-delete modification occurred? (It's the damn deletion callbacks…)


dkf added on 2015-07-23 13:37:18:

The simplest possible thing would be to try using Tcl_NextHashEntry during the teardown instead of Tcl_FirstHashEntry. The only reason for using the other one is if you are adding elements during the teardown; if we can confirm that that won't happen, we won't hit problems since Tcl_NextHashEntry is safe for the iteration/deletion pattern.

Callbacks are OK provided they don't modify the hash table itself. Some sort of flag in the entries (not the Tcl_HashEntry but the things they point to) and another in the structure containing the hash table might be enough. I'd like to avoid putting the smarts in the Tcl_HashTable system itself as that's already quite complicated enough for a fundamental data structure.


dgp added on 2015-07-22 14:23:55:
The Tcl_HashTable struct does indeed have a numEntries field.

Seems at least we could notice when it holds 0 and escape
without the scan (and perhaps reset the empty table to the
staticBuckets initial configuration?).

dgp added on 2015-07-22 14:13:09:
A related design issue (perhaps mentioned in another ticket somewhere?)
is that HashTables are coded to only grow and never shrink.  So even
when a hash table holds only one entry, if the table was ever big, that
one entry sits in an ocean of empty buckets, and on average half of
them need scanning for T_FHE() to find it.

Hmmmm.... It appears that T_FHE() called on an empty table is still
required to scan all the buckets?!  We don't have a size value?

dgp added on 2015-07-22 14:08:10:
Open to a patch along those lines, but I have my doubts.

I suspect that most of the cost is repeatedly scanning over
empty buckets since Tcl_FirstHashEntry() always starts looking
at bucket 0.

Would improve things (enough?) to arrange for a circular
bucket search to have a start/stop point determine by the
value of searchPtr->nextIndex first passed to Tcl_FirstHashEntry().
And then have repeated Tcl_FirstHashEntry() calls remember where
we were last looking.

The snag is that the obvious way to do such a thing appears to
require modifying fields in public structs.

Might be worth a demo on a branch, just to see if it's effective,
and then try to conjure some workaround of the non-opaque struct
corner we've painted ourselves into.

aku added on 2015-07-21 22:37:24:
Given the problem with destructor callbacks ...

Might it make sense to split tear-downs involving hash-tables into two phases ?

Phase 1: Mark structures as gone/dead/whatever, run callbacks. - Linear

Phase 2: Do actual tear-down, linear.

dgp added on 2015-07-20 12:13:51:
Raise prio.  More folks are noticing this problem.

dgp added on 2015-03-24 14:16:59:
Re-titled to more precisely identify the problem.

Hash table teardown is O(N^2) in the number
of table entries, unless one can guarantee that
the teardown loop is the only thing changing the
table.  That is, no callbacks!

Destructor callbacks are common in much of
the Tcl implementation, so we can expect to
hit this performance issue in many places.

dgp added on 2015-03-23 12:24:16:
Yes, the branch as it is cannot be accepted as a fix.
It is committed only to simplify the ability for anyone
concerned to be satisfied that it is this design matter
that is the source of the bad performance.  There's no
need to look for any additional causes.  This explains it.

sebres added on 2015-03-23 08:48:07:
BTW, namespace oriented OO systems like Itcl can create again a child namespace (in destructor of it), so Tcl_NextHashEntry either makes a segfault or not affect this new child namespace - memory leak or segfault later (because no parent hash object).

sebres added on 2015-03-23 08:41:45:
I'm not sure the suggested "bugfix" with Tcl_NextHashEntry would be good. It's not safe (segfaults) when within cycle with Tcl_NextHashEntry the hash will be modified (Tcl_DeleteHashEntry).
In addition I'm not sure all the entries will be affected (because of rebuild of hash within).
The "proper" fix can be something like mark the hash as "will_be_deleted", to prevent rebuild bucket chains during deletion.

dgp added on 2015-03-22 22:30:43:
Trouble is in "safe" teardown of big hash tables.

See bugfix branch for demo of proper linear
performance but with at least risk if not reality
of improper functioning.

msofer added on 2015-03-22 21:20:14:
The problem is not new: same behaviour observed in 8.5.0 and 8.4.15