Ticket UUID: | 32c5740a4d3e86b1973915d92f07d5aee6104a14 | |||
Title: | Undesirable [lappend] performance as LIST_MAX is reached | |||
Type: | Bug | Version: | trunk | |
Submitter: | dgp | Created on: | 2015-11-17 18:40:08 | |
Subsystem: | 14. List Object | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Important | |
Status: | Open | Last Modified: | 2015-11-30 18:25:02 | |
Resolution: | None | Closed By: | nobody | |
Closed on: | ||||
Description: |
Phil Brooks presented at the 2015 Tcl Conference, and later posted to comp.lang.tcl a demonstration of [lappend] growth of large lists, and some undesirable performance. Here's his demo script as it appeared in clt: set i 0 set my_list {} set count 0 set t0 [ clock seconds ] # Create a list with a billion elements while {$i < 1000000000} { lappend my_list "List element $i" incr count if { [expr {fmod($count, 100000000)} ] == 0.0} { # Show progress every 100 million elements set tnow [ clock seconds ] puts "elapsed time: [ expr { $tnow - $t0 } ] seconds" puts [llength $my_list] } elseif { $count > 403000000 } { # Show progress more incrementally if { [expr {fmod($count, 1000)} ] == 0.0} { set tnow [ clock seconds ] puts "elapsed time: [ expr { $tnow - $t0 } ] seconds" puts [llength $my_list] } } incr i } | |||
User Comments: |
dgp added on 2015-11-30 18:25:02:
Committed to trunk item 1) from the roadmap. This takes care of the demo script submitted. Leaving this open for the other items, but they're mostly matters of cosmetics. dgp added on 2015-11-23 20:56:49: So, the path forwards appears to be: 1) Add to Tcl_ListObjReplace() the code needed so that attempts to realloc List structs happen where they can be of potential benefit. 2) Then, there may or may not be reason to reconsider the actual growth algorithm. Experience with Tcl_ListObjAppendElement() suggests it won't really matter. 3) Finally, it sure would be nice not to have all this list growth code semi-duplicated multiple times. It seems there ought to be a refactoring that puts these matters into one place. dgp added on 2015-11-23 20:52:27: Looking some more, we find that Tcl_ListObjReplace() is getting called because the instruction INST_LAPPEND_LIST_STK is doing the work. This is a recent invention. Tcl_ListObjReplace() is not the only routine that grows lists. Tcl_ListObjAppendElement() also does so, and is a more direct implementation to use here. The instructions we need to get that routine working for us are INST_LAPPEND_SCALAR* The easy path to that is to wrap the whole test script in an [apply {{} {...}}] Indeed this performs much much better, growing up all the way to the LIST_MAX limit in the time it takes the original demo to fill less than the first 200M elements. It is not hard to make the compiler use INST_LAPPEND_SCALAR instead of INST_LAPPEND_LIST_STK in the demonstrated case. The problem is that the two instructions exhibit different behaviors regarding read traces, and the test suite is not happy with the change. This is a long, sick, sad tale. Explore if you like; it's not pretty. We've apparently decided the least bad option we have is to leave things inconsistent and broken. The next thing to examine, though, is just why Tcl_ListObjAppendElement does so much better than Tcl_ListObjReplace. The answer appears to be that the former takes an extra step in attempting to use the realloc() call of the memory allocation subsystem before moving on to a full copy and extend. The copies are what kills us in the time, and apparently in precisely this demo case, realloc() is able to do what it exists to do -- grow a (virutal) memory buffer while avoiding copies as much as it can. dgp added on 2015-11-19 20:17:52: Some probing finds that the slow growth takes place in the routine Tcl_ListObjReplace(), and examination of the code in that routine starts uncovering other bugs such as: http://core.tcl.tk/tcl/tktview/40f628e8e343f5e3cbef0cd41dff719ef2a11e34 dgp added on 2015-11-19 18:13:03: Next thing I notice is the anti-idiom if {[expr {...}]}... The condition argument of an [if] command is already an expression. There's no need to use [expr] to make it one. It is also strange to make use of fmod() to do integer modulus in light of the % operator. Better code looks like: if {$count % 100000000 == 0} {... Somewhat surprisingly the use of [expr] makes little difference. Bytecode compilation gets to essentially the same instructions. Avoiding the floating point conversion tax helps some. A few more tweaks leads to: set my_list {} set count 0 set t0 [ clock seconds ] # Create a list with a billion elements while {$count < 1000000000} { lappend my_list A set count [llength $my_list] if { $count % 100000000 == 0.0 } { # Show progress every 100 million elements set tnow [ clock seconds ] puts "elapsed time: [ expr { $tnow - $t0 } ] seconds" puts $count } elseif { $count > 402600000 } { # Show progress more incrementally if { $count % 1000 == 0.0 } { set tnow [ clock seconds ] puts "elapsed time: [ expr { $tnow - $t0 } ] seconds" puts $count } } } dgp added on 2015-11-19 17:38:44: With that change, the next thing to note is that the "slow growth" bug cannot be demonstrated using Tcl 8.5, because it still suffers from the predecessor bug 3293874 where at the point where Tcl 8.6 starts slow growth, Tcl 8.5 just aborts and claims the list is already too big. I don't recall why I chose not to backport the fix, but I don't see a strong reason to revise that choice. Folks needing Tcl to scale to it's max list lengths just need to use Tcl 8.6. dgp added on 2015-11-19 16:59:26: The first problem with this demo is the demands it makes. In order to see the demonstration, a list of ~400M elements is needed. As written each element has a unique value which requires 8 bytes for the pointer, 32 bytes for the Tcl_Obj struct, and about 20 bytes for the string value. This means a minimum of ~24G of active RAM is needed just to run the demo. Such systems are not uncommon, but they're not yet the norm, I'd say, and in fact my everyday development system lacks enough memory. Since it's a linux system, all I get is death by OOMKiller. I believe the key demonstration doesn't need unique element values, just a long list, so I edit the lappend command to be: lappend my_list A This makes a list of 400M elements, but they are all the same element, so we now only require ~4G to demo the problem. Many more folks can look at that. |