Tcl Source Code

View Ticket
Login
Ticket UUID: 2318906
Title: order of insertion seems to have large performance impact
Type: RFE Version: None
Submitter: nobody Created on: 2008-11-21 00:33:03
Subsystem: 15. Dict Object Assigned To: dkf
Priority: 3 Low Severity:
Status: Open Last Modified: 2009-06-15 16:17:52
Resolution: None Closed By:
    Closed on:
Description:
See attached tarball which has a script showing the problem.

This seems to be related to nested dicts.

I'm doing something like this in a loop:

dict set $d "key1" $x $y
dict set $d "key2" $a $u
dict set $d "key2" $b $v
dict set $d "key2" $c $w
...

The "key2" inserts can occur multiple times per iterations.

It's much faster to collect the "key2" stuff into a separate dict and copy them in one shot at the end of all the loops into $d instead of interleaving "key1" and "key2".
User Comments: dkf added on 2009-06-15 16:17:52:

data_type - 360894

dkf added on 2009-01-06 18:41:11:
On my machine (using [time] to do the measurements, which uses microsecond precision) I get a relative speed difference of a bit over 15 times with Tcl 8.5.2. That seems to be acceptable to me; we never warranted that nested dicts would be amazingly efficiently updated and accessed, and it is no surprise that there are ways to achieve performance-sensitive stuff that are faster. (You have a speed issue, true, but you have a trivial work around.)

It would be possible to do something more efficient for this case, but working out how and when to enable the efficiency saving mechanism (i.e. caching the fact that a key is used on a path) is difficult and I'm not sure that the gains would be worth it. Keeping open as something to consider.

[email protected] added on 2008-11-22 03:45:02:
Original submitter here. Thanks for looking into this. I'm sorry I did not attach the result my run in the original filing.

Interesting results. On my machine, I see the following after changing to 'clock milliseconds'. Fast method gives: 68 ms, Slow method gives: 1565 ms.

uname -a: Linux hostname 2.6.9-11.ELsmp #1 SMP Fri May 20 18:26:27 EDT 2005 i686 i686 i386 GNU/Linux

Running the pre-built binaries: ActiveTcl8.5.5.0.287690-linux-ix86-threaded.

I'll try building my own Tcl.

fast method.
1000 lines done.
2000 lines done.
3000 lines done.
4000 lines done.
5000 lines done.
lines: 5000
filelines: 3267
commitlines: 1733
elapsed: 68
slow method.
1000 lines done.
2000 lines done.
3000 lines done.
4000 lines done.
5000 lines done.
lines: 5000
filelines: 3267
commitlines: 1733
elapsed: 1565

nijtmans added on 2008-11-21 14:16:27:
Oops, forgot to save the file before running it.

When changing "clock seconds" to "clock milliseconds",
it gives:

$ tclsh85 dict_test.tcl
fast method.
1000 lines done.
2000 lines done.
3000 lines done.
4000 lines done.
5000 lines done.
lines: 5000
filelines: 3267
commitlines: 1733
elapsed: 62
slow method.
1000 lines done.
2000 lines done.
3000 lines done.
4000 lines done.
5000 lines done.
lines: 5000
filelines: 3267
commitlines: 1733
elapsed: 42


When separating the key "filesincommit" in another
"gitinfo3" variable, the elapsed time drops to 40.

The top dict only has 3 entries, that's what causes
this inefficiency.

nijtmans added on 2008-11-21 13:50:59:
I don't think this is a bug. In the code above, every
"dict set" has to search two keys in two hash tables.
Separating the "key2" stuff, means you only have to
seach one key for each "dict set", and only do a
single search for "key2". A hash search depends
much on the size of the hash table, but small
hash-tables are relatively inefficient. It is
simply the nature of hash-tables.

Running your code:

$ tclsh85 dict_test.tcl
fast method.
1000 lines done.
2000 lines done.
3000 lines done.
4000 lines done.
5000 lines done.
lines: 5000
filelines: 3267
commitlines: 1733
elapsed: 0
slow method.
1000 lines done.
2000 lines done.
3000 lines done.
4000 lines done.
5000 lines done.
lines: 5000
filelines: 3267
commitlines: 1733
elapsed: 1

That's fast, less than 1 second. Even
when changing "clock seconds" to "clock milliseconds",
the answer is the same, which means that it is less
than the resolution of the windows clock
(something like 1/15 sec)

Collecting the "key2" stuff into a separate dict
is indeed a good optimization.

Regards,
         Jan Nijtmans

[email protected] added on 2008-11-21 07:33:03:

File Added - 302311: dict_test.tgz

Attachments: