Tcl Source Code

View Ticket
Login
Ticket UUID: 1772004
Title: objReform
Type: Patch Version: None
Submitter: msofer Created on: 2007-08-10 21:43:02
Subsystem: 10. Objects Assigned To: msofer
Priority: 5 Medium Severity:
Status: Open Last Modified: 2007-08-13 04:15:17
Resolution: None Closed By:
    Closed on:
Description:
(all numbers on 32 b systems)

A Tcl_Obj requires 24 bytes, a typical cache line fits 32 bytes. This means that about half the Tcl_Objs will occupy a single cache line, half will require two.

Adding 8 bytes at the end of a Tcl_Obj insures that all Tcl_Obj are cache-aligned. The additional space can be used to store the stringRep for small strings, saving the alloc/free.

Attached a proof-of-concept patch (hard wired for 32b, for 64b 16 bytes should be added).
User Comments: msofer added on 2007-08-13 04:15:17:

File Added - 240944: objReform.diff.1.3

Logged In: YES 
user_id=148712
Originator: YES

File Added: objReform.diff.1.3

das added on 2007-08-12 14:03:56:
Logged In: YES 
user_id=90580
Originator: NO

apropos ckalloc() and malloc() alignment:
I recently comitted a change that ensures that ckalloc() returns 16-byte aligned memory on Darwin (where malloc() is guaranteed to return 16-byte aligned mem), this has performance benefits on the platform because things like memcpy() are implemented with vector instructions (i.e. 128-bit registers) where misaligned memory access is costly.
the patch could easily be extended to other platforms (if malloc alignment is known):
http://rutherglen.ics.mq.edu.au/fisheye/changelog/Tcl/?cs=MAIN:das:20070629031705

dkf added on 2007-08-12 00:19:43:
Logged In: YES 
user_id=79902
Originator: NO

Not only does malloc() not guarantee that things are cache-aligned, but it pretty much guarantees that things are *not* cache-aligned (that's where it puts its housekeeping data). To be exact, it typically allocates on a 16-byte boundary and then takes the first 8 bytes for housekeeping.

msofer added on 2007-08-11 23:28:05:

File Added - 240847: objReform.diff.1.2

Logged In: YES 
user_id=148712
Originator: YES

Some notes:
(a) the alignment issue is likely a non-issue on P4 (128 byte cache line!?). 

(b) the alignment issue is likely a non-issue on 64 bit with 32b cache lines: a Tcl_Obj takes 3/2 of a cache line, it always requires two lines (assuming initial alignment of the Tcl_Obj block, see below). 

(c) the original patch did not take care to align the Tcl_Obj block, which is assigned in TclAllocateFreeObjects. If ckalloc returns a cache-aligned pointer, all's well. If not, the patch *insures* that all objs require two cache lines. But ckalloc/malloc does not guarantee alignment on 32 byte boundaries, just 8 (in general, afaiu, usw)

Attaching a new (hacky! Proof of concept!) patch that does insure 32 byte alignment. Again, patch is HACKY and only for 32 bit archs (both the size of the text area and the rounding macros need to be MUCH better).
File Added: objReform.diff.1.2

msofer added on 2007-08-11 06:52:31:
Logged In: YES 
user_id=148712
Originator: YES

pat thoyts produced initial benchmarks: http://www.patthoyts.tk/benchmarks.html

msofer added on 2007-08-11 04:43:04:

File Added - 240762: objReform.diff.1

Attachments: