Tcl Source Code

View Ticket
Login
Ticket UUID: 3064962
Title: Suboptimal linkage of stubbed functions
Type: Bug Version: None
Submitter: ferrieux Created on: 2010-09-12 23:06:56
Subsystem: 53. Configuration and Build Tools Assigned To: ferrieux
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2010-09-20 14:53:38
Resolution: Wont Fix Closed By: ferrieux
    Closed on: 2010-09-19 15:47:12
Description:
When looking in detail at the native code generated by gcc on x86, one sees that stubbed functions, that get the EXTERN class (extern + attribute(visibility=default)), are _always_ called through the PLT (program linkage table), doing an unwanted indirection with possible strong cache misses.

I don't know why it is so, because the build system uses -fvisibility=hidden, so that all those symbols should stay hidden and avoid the PLT.
But the net result is a very measurable performance  hit, as illustrated by in the tclcore discussion  http://code.activestate.com/lists/tcl-core/9531/
User Comments: nijtmans added on 2010-09-20 14:53:38:
Fully Agreed: At this point I wouln't introduce -Bsymbolc as well.

ferrieux added on 2010-09-19 22:46:51:
You're right, PLT diet is no silver bullet :(

The more I dig into perf analysis, the more I realize that individual block offsets and cache associativity effects dominate everything else. It turns out that the large benefit I observed for -Bsymbolic on some tests, is completely reverted by an extra 'void foo(void){}' early in the lib (eg at the beginning of regcomp.o).

So I'm happily closing this one as Red Herring (Won't Fix). Sorry for the noise.

(as for the OO patch, I'll be posting the results on 3010352)

nijtmans added on 2010-09-14 14:04:39:
The three interesting lines are:
======================================================
tcl86.stat tcl86sym.stat tcl86foo.stat tcl86foosym.stat000 VERSIONS:
1.00 0.98 1.21 1.19  502 STR match, complex (failure)
1.00 0.98 1.24 1.22  515 STR match, recurse2 (fail)
1.00 0.97 1.19 1.18  516 STR match, recurse2 (success)
=================================================

It shows indeed that -Bsymbolic speeds up Tcl by
about 2%, but does not prevent the slowness that
was the subject of this issue.

Could you please run the same benchmark with
the patch from:
[ tcl-Feature Requests-3010352 ] make all TclOO API functions MODULE_SCOPE

Reducing the number of exported functions, that's the
advice given by the article refered by Kevin, so it
is usefull to try.

ferrieux added on 2010-09-14 05:39:31:

File Added - 386570: allmin.vs

ferrieux added on 2010-09-14 05:38:38:

File Added - 386569: min.vs

ferrieux added on 2010-09-14 05:37:38:
Done extensive benchmarks, running each 10 times and keeping min time (also looked at median and average, confirmed the min is reliable). Two surprises:
(1) -Bsymbolic does not help with the slow match bench with an odd-foo offset.
(2) -Bsymbolic has the relative effect shows in attachment "min.vs", sorted by ratio (2nd column), <1 meaning -Bsymbolic faster. Ratios range from 0.85 to 1.04. I suspect a superposition of the net gain of PLT avoidance, and remaining cache associativity effects. To confirm this intuition look at 2nd attachment, allmin.vs, showing the four combinations (with/out -Bsymbolic)x(with/out foo).

Bottom line: -Bsymbolic is beneficial, but since it also changes the addresses of the symbols  (be it only by the size of the PLT), the effect is hard to isolate. Work in progress...

nijtmans added on 2010-09-14 03:54:13:
Here is a patch, adding -Wl,-Bsymbolic to the shared lib link flags. Please verify the
performance difference, so we can decide whether this is good or not.

If it's ok, tcl.m4 can be copied to Tk and the configure script regenerated, to
get the same effect there.

I don't expect much difference in timing, except for the match benchmark. But we'll see.

nijtmans added on 2010-09-14 03:50:31:

File Added - 386556: bsymbolic.patch

nijtmans added on 2010-09-13 21:47:30:
Relevant here is Kevin's remark:
 <http://code.activestate.com/lists/tcl-core/9542/>
and
 <http://software.intel.com/en-us/articles/performance-tools-for-software-developers-bsymbolic-can-cause-dangerous-side-effects/>
(Thanks, Kevin!)

I would be in favour of adding -Wl,-Bsymbolic to tcl.m4/configure:
- Only in the release build
- Only when linking the shared library, not in static builds.
- Only with the GNU linker

Reason:
- On Windows, this is the default, multiple
  dll's using the same symbols, result in
  multiple versions of the same symbol.

Tcl doesn't have this problem, simply
because it doesn't export data, only
function symbols. And if some application
would use this 'feature', it is non-portable
because it wouldn't work on Windows.

I'll see if I can prepare a patch, before
deciding on this.

(B.T.W., I don't think this is actually a Bug.....)

ferrieux added on 2010-09-13 15:17:47:
Update: it's no longer mysterious, "visibility=default" meaning "not hidden".
But the issue remains, and as kbk suggests the fix could very well be the -Bdynamic linker option.

Attachments: