Tcl Source Code

View Ticket
Login
Ticket UUID: 3588687
Title: Tcl_InitStubs() Broken As Designed
Type: Bug Version: None
Submitter: dgp Created on: 2012-11-20 06:41:17
Subsystem: 39. Package Manager Assigned To: dgp
Priority: 8 Severity:
Status: Open Last Modified: 2012-12-07 23:44:01
Resolution: None Closed By:
    Closed on:
Description:
A stubs-enabled extension encounters Tcl in three distinct ways.

1) It compiles against a tcl.h file from some release of Tcl.  With 
    USE_TCL_STUBS active, the particular tcl.h file determines the
    layout of the stubs table that the extension is compiled to assume.

2) It links against some libtclstub.a file.  That libtclstub.a file in turn
    was compiled against some tcl.h file and from it got a value of
    TCL_STUBS_MAGIC compiled into it.  As far as I can tell there is
    nothing in place to make sure that the two tcl.h files so far mentioned
    are the same file, or at a minimum define the same TCL_STUBS_MAGIC
    value.  Until now it's never been an issue since we've never changed
    that value.

3) The extenson [load]s into some Tcl interp.  The extension calls
    Tcl_InitStubs() passing in the interp value.  T_IS pulls a table pointer
    out of the iPtr->stubTable field of that interp for vetting for suitability.

What's important is that the table in iPtr->stubTable has a compatible
layout to the one that was compiled into the extension.  But the
mechanism in place cannot check this.  To illustrate this, consider
compiling an extension containing the call

    Tcl_InitStubs(interp, "9.0", 0);

but compiling it against a Tcl 8 header file, so all of it's other calls
into the Tcl library are coded based on the assumptions of the Tcl 8
stub table layout.

Next imagine linking this against libtclstubs9.a so that the TCL_STUBS_MAGIC
value the Tcl_InitStubs() call will be checking will be the one for the Tcl 9 table.

Then [load] in a Tcl 9 interp.  All of Tcl_InitStubs() checks will pass, but we'll
be left with an extension compiled to Tcl 8 layout expectations making use
of a Tcl 9 stub table, and that's gonna crash hard.

Now, if the extension writer would have just stuck with the recommended
convention of

    Tcl_InitStubs(interp, TCL_VERSION, 0)

he could not get in this trouble.  But that is our fault for giving him that
knob to tweak in the first place.  Tcl_InitStubs ought not have
a "version" argument for the coder to get wrong.
User Comments: dgp added on 2012-12-07 23:44:01:
I've been wanting this resolved before releasing 8.6.0
so that some 8.6.* patch release at least could have
the broken design fixed.  And I did not want some
frozenness of an 8.6.0 release to prevent that.

As we've worked through it, you've made clear that
a design fix can be accompanied by linking support
compatible with earlier tcl.h files.  In fact we can keep
the tcl.h in Tcl 8 releases unchanged as we convert
from versioned to unversioned stub libs at some future point.
So an 8.6.0 release need not be blocked by this.

We can even backport to the 8.5 branch, I believe, so all
extension builders using current releases of Tcl will get onto
the fixes reasonably quickly and reliably.

The remaining reasons to still get this resolved sooner rather
than later (though post 8.6.0) are

1) Just giving more time to extension writers with the tools they
    need to navigate the 8->9 transition.
2) To consider a backport of the same fixups to core-8-4-branch,
     we'd need it done before an 8.4.20 release, which I hope will
     not be delayed too much longer.

dgp added on 2012-12-07 23:35:43:
My goal for Thread 2.7.0 is for one set of source
code to compile into two distinct thread.so files,
depending on whether they were compiled against
Tcl 8 or Tcl 9 headers.  Each would be [load]able
only in an interp of the corresponding version.

Source compat, not stubs compat.

The 'novem-support' branch in thread repos is
intended to be work toward that end.

dgp added on 2012-12-07 23:32:05:
For the sake of recording things, with the new novem tip:
$ make shell
% load path/to/bug-3588987/pkgb.so
version conflict for package "Tcl": have 9.0a0, need 8
% load path/to/trunk/pkgb.so
% pkgb_demo
not supported

That seems a clean illustration of the difference
in our approaches.  I tell you right away this 
extension doesn't work here.  You let it start
working, and then just let each command fail
case by case.

And FWIW, yours is, IMHO, much much harder
to code, read, and maintain.

nijtmans added on 2012-12-07 23:29:12:
> Do you know anyone else who has the
> ability to [load] Tcl 8 compiled stub-enabled
> extensions into Tcl 9 on their list of desirables?
No, it's not a requirement. But - as far as the Tcl 9
plans I know about - I don't see any reason why
Thread 2.7.x would not run on Tcl 9
on 32-bit platforms, just by changing the
Tcl_PkgRequire arguments. I don't think
any API functions that Thread uses  will
be deprecated in Tcl 9.  We will see, as
Tcl 9 advances.

On 64-bit platforms, it's out of the question!

>Taking a different perspective...
I know, still I think that tclversion is useful
to generate a better error-message. It's
good that we agree on the signature of
TclStubInit. There is no hurry to define
what tests should be done and how
the error-message should look like.

Please put the priority at Tcl 8.6,
I will stop this discussion now.
Describing the desired changes in a TIP
would be the proper way to continue,
it's not crucial for Tcl 8.6 any more

Thanks for all your time!

dgp added on 2012-12-07 23:16:33:
Taking a different perspective on this, we can look
at the diffs between our two approaches here:

http://core.tcl.tk/tcl/fdiff?v1=2078de7547e3b152&v2=b7c94ddeea145bea

Mostly cosmetic stuff, I think.  The only real sticking point is
that although we both pass a "tclversion" argument to TclInitStubs(),
mine it the only one using that value to check anything.

dgp added on 2012-12-07 23:10:28:
Please tell me, after your heroic efforts, if
we have a Tcl 8 compiled, stubs-enabled
extension which calls the routine

    Tcl_Eval(interp, "incr foo 08");

and you [load] that into Tcl 9, what should
it do?

An interp cannot be Tcl 8 and Tcl 9 at the
same time unless Tcl 9 is no different
from Tcl 8.

dgp added on 2012-12-07 23:07:36:
That's not a fix.  That's a demonstration
of the wrongheadedness of the goal.

Do you know anyone else who has the
ability to [load] Tcl 8 compiled stub-enabled
extensions into Tcl 9 on their list of desirables?

Everyone else I'm aware of is happy to completely
terminate stub compatibility at the 8 -> 9 transition
and have only source compatibility as the migration
support path.

dgp added on 2012-12-07 23:04:40:
Regarding trunk updates.

Dear God No.

dgp added on 2012-12-07 23:01:25:
I don't expect a mere change of argument type
in Tcl_EvalEx is our Tcl 9 destiny.  Rather, I presume
with the exception of Tcl_PkgRequireEx() the whole
table is going to be torn apart and put back together again.
At least.

nijtmans added on 2012-12-07 22:58:12:
Two remarks:
- your commit broke test load-8.4
- pkgb.c wouldn't even compile with Tcl 9 headers. That's the difference
  with the Tcl_EvalEx() example: with Tcl_EvalEx, the compiler will
  not tell you that the signature is wrong. With Tcl_GetDefaultEncodingDir
  the compiler will tell you all about the problem.
Both problems fixed on trunk now.

I don't know if you really want to keep this on trunk, but it would
be a good example how deprecated functions could be
handled. It's (almost) the same way as Thread 2.7 does it, only
in this case we know the stub entry exists, so we can simply
test for NULL. :->

dgp added on 2012-12-07 22:06:29:
Using 'make shell` on novem branch, here's
the result of [load]ing the updated pkgb.so files
on the trunk and bug-3588687 branches:

% load path/to/bug-3588687/pkgb.so
version conflict for package "Tcl": have 9.0a0, need 8
% load path/to/trunk/pkgb.so
% pkgb_demo
make: *** [shell] Segmentation fault

The [pkgb_demo] command calls
Tcl_GetDefaultEncodingDir() which has been
removed from the novem stubs table, causing
the crash.  The bug fix on bug-3588687 prevents
this crash from happening by preventing the [load]
of the unsupportable extension.

Now you may object Tcl_GetDefaultEncodingDir
isn't in the Tcl 9 API!  So it's a bug for pkgb.c to
be passing "8.5-9.1" to Tcl_InitStubs().  Two replies:

1) I submit that Tcl_EvalEx() with an int numBytes argument
    is likewise not in the Tcl 9 stubs API.  So this is just the same
    bug already demonstrated.  Except this new test makes clear
    this is not some problem that only shows up in a 64-bit scenario.
    It's implicit in the freedom to change the stubs tables incompatibly
    on a new major version of Tcl.
2) Even if it is a bug, the common theme in much of this discussion is
     how much better it is to react to a bug with an error message, or even
     a controlled panic, rather than a crash.

nijtmans added on 2012-12-07 21:29:38:
I would concentrate on Tcl9 getting better error-messages. We have
the 'tclversion' as argument now, which can be used in generating
the error-mesage for Tcl 9 extensions. Let's do it right first
for Tcl9, then see how we can improve things for Tcl 8.

Closed the "novem-demo-bug-3588687" branch now. demo has ended

dgp added on 2012-12-07 21:26:48:
Demonstrating on the tip of novem-demo-bug-3588687:

% load path/to/trunk/pkgb.so
interpreter uses an incompatible stubs mechanism
% load path/to/bug-3588687/pkgb.so
interpreter uses an incompatible stubs mechanism

And in reality, the first result for deployed foo.so will be:

"This interpreter does not support stubs-enabled extensions."

dgp added on 2012-12-07 21:17:26:
Ok, you're just repeating my starting point.

The routine Tcl_InitStubs() and therefore all
deployed binary stubs-enabled extensions -- the
ones linked against the "versioned stubs lib"
-- suffer from a fundamentally broken design.  If any
of them passed arguments requesting or
accepting "Tcl 9", then those deployed, binaries
are going to crash if they are [load]ed into novem.
It's ugly and it's broken, but we can't turn back time.
This was botched, and we're going to suffer for that botch.

All we can do is get a fixed unversioned stubs library
out and people started using it before Tcl 9 loading
attempts get widespread.

If we really must must must do something that will
allows any old foo.so linked to a versioned stubs
library that's been lying around untouched to
[load] into Tcl 9 and not crash, then the only solution
is to change TCL_STUB_MAGIC in Tcl 9 (for all
platforms, which I will demonstrate).  Then the
[load] will not crash, but will instead generate the
error message:

"This interpreter does not support stubs-enabled extensions."

and as a consequence, those new extensions linked to the
new unversioned stubs library would be unable to generate any
better error message than:

"interpreter uses an incompatible stubs mechanism"

Consider, though, which foo.so are going to crash.  Only those
which passed a 'version" string argument that accepted a Tcl 9
interp.  I submit that no one has done this, other than folks
like us who are probing and demonstrating the brokenness
if Tcl_InitStubs().  So, we would be sacrificing the ability to
produce more meaningful error messages like

"version conflict for package "Tcl": have 9.0a0, need 8"

for people actively making extensions today, using the
latest tools and advice for the sake of preventing crashes
in foo.so that do not exist outside the labs of torture testers
like us.

I prefer that we keep TCL_STUBS_MAGIC unchanged.
Get better error messages.  And if any of those mythical
foo.so appear, then they crash.  Sorry.  We limped along
with a misdesigned Tcl_InitStubs for a decade, and that's
the pain.

nijtmans added on 2012-12-07 21:16:33:
Please try again with the current "novem-demo-bug-3588687" branch.
You will get:
% load path/to/trunk/pkgb.so
interpreter uses an incompatible stubs mechanism

This demonstrates that this solution works for
already-built extensions, no need to recompile
and link to a newer stub library.

nijtmans added on 2012-12-07 20:54:44:
The challenge is: Some person A build an extension foo.so.
Another person B, which doesn't have the source code of
that extension, tries to load it in Tcl9, and it crashes.
So, the challenge is, how can Tcl9 detect that without
making modifications to foo.so. We don't want to
backport tclStubLib.c modifications to the field everywhere,
giving additional restrictions to people. Joe English
already expressed his reluctance to modify
tclStubLib.c in current releases <= 8.6. I tried it already -
before I found the alternative solution in tclLoad.c -
which was put on-hold on your request.....

It became clear to me that the solution must be
found in Tcl9, not by modifying the 8.x stub library!

nijtmans added on 2012-12-07 20:42:51:
>I submit this demonstrates that bug-3588687 branch
>is a fix for the problem demonstrated by pkgb.so.
But for the fix you had to recompile pkgb.so and link
to a modified stub library. So this doesn't work
with already-compiled pkgb.so binaries.

dgp added on 2012-12-07 20:35:03:
Ok, two things are becoming clear.
1) How I was misunderstanding/forgetting how
    Tcl_PkgRequireEx() had expanded in function,
    and how that impacts *all deployed* stubs-enabled
    extensions. (eeeuuuuwwww!!!)
2) Those areas where we still suffer from fundamental
     disagreement.

I'm going to work on code demonstrations.  That seems to
be the way we're most successful clearly demonstrating
points to one another.

While I work on that... Thanks for the test case.  I've merged
it over to the bug-3588687 branch.  On that branch, build the
pkgb.so for testing.  Built another on the trunk.

Then over on the novem-demo-bug-3588687 branch, `make shell`:
% load path/to/bug-3588687/pkgb.so
version conflict for package "Tcl": have 9.0a0, need 8
% load path/to/trunk/pkgb.so
% pkgb_unsafe
Crash!
make: *** [shell] Aborted

I submit this demonstrates that bug-3588687 branch
is a fix for the problem demonstrated by pkgb.so.

nijtmans added on 2012-12-07 17:54:57:
My idea of an unversioned stub library is now in the "novem-unversioned-stub"
branch. The only thing missing is the MAGIC value change when the API
changes in an incompatible way (novem-64bit-sizes)

nijtmans added on 2012-12-07 17:12:10:
I now turned pkgb.so in core-8-5-branch and trunk into a
Tcl9 interoperability test. And closed the "novem-support" branch.

Tcl_InitStubs(interp, "8.6 9", 0) unfortunately doesn't work, it leads
to a crash. That must be a bug somewhere in tclPkg.c. Luck
enough, "8.6-9.1" works, and that's good enough for this test.

Now the challenge is that pkgb.so loaded in any Tcl interpreter
should either give an error message, either succeed, but
never crash. On 32-bit platforms that's no problem: The
signature of the used API will at most change some parameters
from int to size_t, but they all are the same size. On 64-bit
platforms I see no other solution than to change the MAGIC
value.

nijtmans added on 2012-12-07 15:48:04:
>we agree that Tcl_InitStubs(interp, "8.6-", 0)
>is *never* a sensible thing to do, right
Thinking more about it, I agree! It would promise
that it will work with Tcl 10 as well, which
is nonsence, of course.
Tcl_InitStubs(interp, "8.6 9", 0) should be used.

nijtmans added on 2012-12-07 15:26:34:
>we agree that
    Tcl_InitStubs(interp, "8.6-", 0)
>is *never* a sensible thing to do, right

On 64-bit platforms we agree on that, but
on 32-bit platforms I see no problem.

> Compiling against Tcl 8 headers, but passing
> "9.0" as an argument to Tcl_InitStubs() cannot ever
> work and is always a mistake.)
There is no reason at all to do that: It can only work
when the extension uses the common subset of
Tcl 8.6 and 9.0 API's, but then it makes no sense
to specify "9.0". Specifying "8.6-" would make more sense.


The consensus should be:
    Tcl_InitStubs(interp, "8.6", 0)
means that this extension uses the 8.6 API, so compiles and
runs with any 8.x release with x >= 6.
    Tcl_InitStubs(interp, "9.0", 0)
means that this extension uses the 9.0 API, so compiles and
runs with any 9.x release.
    Tcl_InitStubs(interp, "8.6-", 0)
means that when compiled with Tcl 8.6 headers it will run with
Tcl 8.6, and when compiled with Tcl 9.0 headers it will run
with any 9.x release. In Tcl9 some API's will be removed,
others will be added. Any extension specifying "8.6-" can
only use the common subset API's of 8.6 and 9.0. (
assuming the stub entrie positions are kept equal)

This also means that no extension can claim "8.6-"
during the alpha/beta period of Tcl 9. One method
to enforce that is in tcl.h 9.0 specifying:
#if TCL_RELEASE_LEVEL == TCL_FINAL_RELEASE
#define Tcl_InitStubs(interp, version, exact) \
    TclInitStubs(interp, version, exact, TCL_VERSION, TCL_STUB_MAGIC)
else
#define Tcl_InitStubs(interp, version, exact) \
    TclInitStubs(interp, TCL_PATCH_LEVEL, 1, TCL_VERSION, TCL_STUB_MAGIC)
#endif

But - as proven - claiming "8.6-" compatibility cannot ever work
across the 8.6/9.0 border on 64-bit platforms. That's what the
MAGIC change is needed for.

Regards,
          Jan Nijtmans

dgp added on 2012-12-07 10:40:50:
Just so I can start from an agreed assumption in the morning, we agree
that

    Tcl_InitStubs(interp, "8.6-", 0)

is *never* a sensible thing to do, right?  Any code including that is
wrongheaded and buggy from the get-go.  

(In that sense it's like my example in the original ticket here.
Compiling against Tcl 8 headers, but passing "9.0" as an argument
to Tcl_InitStubs() cannot ever work and is always a mistake.)

There's no compat issue with making Tcl_InitStubs refuse to follow
broken orders. We just need to redesign it so that such mistakes are
reported errors, not panics or crashes.

dgp added on 2012-12-07 10:32:29:
I'm getting a clearer picture now.  Hope to have more extended remarks
tomorrow.  Meanwhile, just for fun you might give this one a try:

    Tcl_InitStubs(interp, "8.6 9", 0);

:)

nijtmans added on 2012-12-07 06:08:06:
> >Tcl_PkgRequireEx() doesn't accept arguments like "8.6-"
> Somehow it works, so we might have hit an unrelated bug in tclPkg.c
> somewhere here .......
TIP #268 specifies that ''The existing functions Tcl_PkgRequire(Ex) are
re-implemented in terms of this function (=Tcl_PkgRequireProc)",
so Tcl_PkgRequireEx() accepts "8.6-" just fine. That's what I
see and it fits with the TIP description (and with my expectations).
I don't see a bug here, after all.

dgp added on 2012-12-07 04:57:39:
I've got trains and family to content with, so may be until tomorrow until
I can reply well, but if you haven't checked what the latest bug-3588687
code would do in this case, that might be worth trying.

nijtmans added on 2012-12-07 04:55:12:
>Tcl_PkgRequireEx() doesn't accept arguments like "8.6-"

Somehow it works, so we might have hit an unrelated bug in tclPkg.c
somewhere here .......

dgp added on 2012-12-07 04:07:07:
I'm only starting, but wanted to quickly observe
that Tcl_PkgRequireEx() doesn't accept arguments
like "8.6-".  It's only built for single version numbers,
not requirement ranges.  For those you have to call
Tcl_PkgRequireProc(), which TclInitStubs doesn't do.

nijtmans added on 2012-12-07 03:27:40:
Let's do an experiment, to show what I want to tell.

Please check out the branche "novem-support". It is a
modified 8.6, mainly pkgb.so is modified to use
    Tcl_PkgRequireEx(..., "8.6-". ...), and uses the
Tcl_EvalEx() function. Clearly this is wrong: It would
allow pkgb.so to be loaded by both Tcl 8.6 and Tcl 9.0,
but Tcl_EvalEx will have a different signature, so that will
lead to a crash. I'll show that.

Then there is the branch "novem-demo-bug-3588687",
which is a modified "novem" with a Tcl_EvalEx signature
change: the numBytes parameter is changed to a
Tcl_WideInt (will be size_t in Tcl9, but this way I can
demonstrate the crash on 32-bit platforms as well)

Let's run this tclsh9.0 and load the pkgb.so from
the "novem-support" 8.6 build:

$ tclsh9.0
% load ../../tcl8.6/unix/dltest/pkgb.so
% info commands pkgb_*
pkgb_unsafe pkgb_sub
% pkgb_unsafe
Crash!
Aborted (core dumped)
$ 

This pkgb.so was compiled and linked against
the unmodified Tcl 8.6 tcl.h and libtclstub86.a.
How come that this was not detected? Whatever
test in tclStubLib.c cannot detect that: pkgb.so
was compiled with a legacy stub library.

So, the only way I see to make Tcl_StubInit give
a nice error message is change the TCL_STUB_MAGIC
magic value for 64-bit builds in Tcl9. Then, even
though pkgb.so was linked with a legacy stub
library it would detect that and give:
   "interpreter uses an incompatible stubs mechanism"
in stead of crashing.

If you have another idea how to prevent a crash
in this case, I'm all ear. That's why my conclusion
is that TCL_STUB_MAGIC should change on
64-bit platforms, it can stay the same on 32-bit
platforms. Do you get the point from this?

Regards,
          Jan Nijtmans

dgp added on 2012-12-07 02:39:03:
We may have different visions of what the
TCL_STUB_MAGIC value tells us.  For me,
if the TCL_STUB_MAGIC value stays the same,
that means:

1) The interp has a stubs table.
2) The first entry in the table is a routine
     with the same signature and compatible
     functioning as Tcl_PkgRequireEx().

and that's it!  Other compatibility checks get
deferred to what T_PRE judges and that
gets controlled by the usual version numbering
scheme.

In that vision, the only reason to change TCL_STUB_MAGIC
is either

1) A stubs table rethink so radical it no longer allows the
    first slot to be assigned to T_PRE(); or
2) An incompatibility that cannot be sorted out by T_PRE()
    using Tcl version numbers.

An example of 2) would be some kind of development fork
where the forks had incompatible stub tables, but were making
claim to the same set of version numbers.  I don't fully grok your
64-bit claims yet, but it's possible they'd be in this category.

dgp added on 2012-12-07 02:25:30:
I must be slow or something (Don't answer that! :) )

If the tclStubLibCompat.c gets left out, then you do not
get a libtclstubs.a that's version-free.  You get a libtclstubs.a
that will not link against an extension compiled with Tcl 8
headers.  Right?

If you commit to keeping including tclStubLibCompat.c
in the stubs library, though, seems you could end up with
a libtclstubs.a that's truly version-free, and sensitive only
to changes in TCL_STUB_MAGIC.  But if we need to keep
including it, then the value of  multiple source files no longer
makes sense to me.

That said, please have a look at my new checkin on the
bug-3588687 branch.  I think it moves things closer to
achieving our increasingly shared goals.

nijtmans added on 2012-12-07 02:02:36:
The value of a separate tclStubLibCompat.c is, that it can be left
out when the stub library is linked using Tcl 9 headers. I am trying
to reach the goal of a version-independant stub library, the difference
is in tcl.h only.

The minimum requirement of Tcl_PkgRequireEx to work is that:
- the entry in the stub table stays the same
- the signature of Tcl_PkgRequireEx stays the same.
I see no reason to change the entry or the signature,
so there is no magic check needed to make it work for
sure. The additional checks are meaningless IMHO.

The real problem is the signature change in other
functions, which are not used in the stub library,
such as Tcl_EvalEx(). Or in the Tcl_Obj structure,
which will (I assume) change the length field
from int to size_t. On 32-bit platforms that doesn't
make a difference (all those types are 32-bit) but
on 64-bit platforms it totally breaks. Therefore I
propose to change the magic value from
     (int) 0xFCA3BACF
to
     (int) (0xFCA3BACB + sizeof(size_t))
Then the magic value on 32-bit platforms stays
the same, on 64-bit platforms it changes. Then
the current check suffices to detect the problem.

So, I agree with you - as you stated earlier:
>Since I'd prefer not to change the TCL_STUBS_MAGIC value
I prefer not to change the TCL_STUBS_MAGIC
as well. But striving for a version-independant
stub library, on 64-bit platforms I don't see
another possibility. On 32-bit platforms I
don't see a reason to change it.

dgp added on 2012-12-07 01:31:44:
The calls to Tcl_PkgRequireEx() within TclInitStubs
are themselves relying on the stubs mechanism.
In order to call them with certainty they will work,
we need to check that the magic value of the stub
table in the loading interp matches the 
TCL_STUB_MAGIC value with which the stubs
library was compiled.  We must check for three-way
agreement.  Two-way is not enough.

dgp added on 2012-12-07 00:37:51:
Found time to examine this again.  First a simple Q.
What's the value of partitioning some of the code into
another new file, tclStubLibCompat.c ?

nijtmans added on 2012-12-05 16:56:53:
Updated the "bug-3588687" branch, trying to get closer to our common goal.
Two remarks:
- I don't think tcl.h should be modified, for binary compatibility with
  older 8.6.x releases. So, in stead defined TclInitStubs() in tclInt.h,
  and let the linking occur through Tcl_InitStubs(). We have it, so why
  not use it?  In Tcl 9.0 we can then make the step to change
  Tcl_InitStubs to a macro.
- The usage of TCL_MAJOR_VERSION and TCL_VERSION in tclStubLib.c
  hinders the adoption of a version-less stub library. I was under the
  impression that the tcl version should be brought in through tcl.h and
  that tclStubLib.c should not reference TCL_VERSION nor
  TCL_MAJOR_VERSION. So I suggest to remove the check in
  HasStubSupport(), in stead check for tclversion[0] == version[0]
  in TclInitStubs. Still that would not be 100% as desired, but
  at least the stub library then becomes completely independant
  from tcl.h. In Tcl 9.0 we could rename tclstub9.0.a to tclstub.a
  (and change tcl.h to use a macro), and  this stub library would
  be usable for Tcl 8.6 (using Tcl_InitStubs), 9.0, 9.1 ... (using
  TclInitStubs)

Wouldn't that be better (although we aren't there yet)? Feedback appreciated!

dgp added on 2012-12-01 00:03:28:
Merge completed.  Thanks.

We're closing in on agreement what the reformed
TclInitStubs() should look like and do.  Just not quite
there yet.  That can continue without the merge held
back, and that's a good thing.

nijtmans added on 2012-11-30 22:48:59:
Merged the (modified) "novem-review" branch to "novem"
Whatever check we will do in HasStubSupport still has
to be decided, it now has a full tclversion and a magic
parameter. so anything can be put there as need arises.
For now, I kept the check the same as in Tcl 8.6.

I don't see the need to remove the Tcl_InitStubs()
function itself. It is fully hidden, but allows the future
(unversioned) tclstub.a to function for Tcl 8 as well.

Further on, the win32 build was broken due to
other changes, fixed that.

Don, Is that sufficient for you to merge the
novem-remove-string-result branch?

dgp added on 2012-11-30 00:29:12:
I've created a branch "novem-review" that rolls back
the 'premature' change to the TCL_STUB_MAGIC value.
If this is acceptable for merging to novem, please do so.
Then I will be able to merge the novem-remove-string-result
branch in and declare that project complete.

If the 64 bit work, or something else, ends up requiring that
we change the TCL_STUB_MAGIC value, then we can do
that when the need arises.  I think the revised Tcl_InitStubs()
routines in novem, and pending for 8.6 will handle that
transition if/when it happens.

dgp added on 2012-11-29 22:30:51:
Update committed to branch.  Now I have TclInitStubs()
checking both TCL_MAJOR_VERSION and TCL_STUB_MAGIC
agreement between extension and stubs library.

I had been thinking about TCL_STUB_MAGIC as a value that flagged
incompatible stubs mechanisms over time as new Tcl major versions
would arrive.  In that line of thought, it seems to duplicate what's already
known from TCL_MAJOR_VERSION.  However, the TCL_STUB_MAGIC
value is also good for flagging incompatibilities that are not temporal
in nature.  In particular, should any forks arise in Tcl development that get
reflected in incompatible stub tables, we'll want this value to make those
distinctions.  Such forks might well be temporary as we undertake various
radical experiments in separate fossil branches, that only get resolved
when merged.  Another possibility that comes to mind is the ability to
flag our known problems with needing a TCL_MEM_DEBUG interp to
successfully [load] a TCL_MEM_DEBUG extension.

dgp added on 2012-11-29 21:47:18:
Correcting the misleading error message is
an obviously correct and risk-free thing to
do in all active branches.  The stubs-failure
world will soon be larger than 8.0 interps,
and the message should reflect that.  Committing....

nijtmans added on 2012-11-29 18:23:38:
B.T.W.: The current error message in tclStubLib.c is:
      "This interpreter does not support stubs-enabled extensions."
Why don't we simply change that? Suggestion:
      "Detected incompatible Tcl version: expected 8.x"
That can still be done in 8.6.0 without any risk.

nijtmans added on 2012-11-29 17:59:35:
Joe English wrote:
>Is there a need to change the 8.6.* stub mechanism at all?
No, I don't see the need now, I would prefer to wait until
Tcl 9.0a1 is out (or just before that, with the Tcl 8.6.x
release just before that) so the changes can be
tested against the 9.0 signature changes. Now we are
just guessing.....

> But before I work on those (a boring boring process) I'd like to know
> whether what I'm proposing is the right thing to try to do
I think it is the right thing to do. We might discuss about
individual functions, whether some parameter or return
values should be int, size_t or ssize_t. I know that
"novem-64bit-sizes" is not ready yet to be merged
to trunk, I thrust dkf to decide when it is. Still
it would be good to discuss and vote for
TIP #115 before doing the merge.

As long as signatures don't change, there is no need to
change the TCL_STUBS_MAGIC value ( I was a
little bit premature doing that in "novem" already..)

Regards,
        Jan Nijtmans

dkf added on 2012-11-29 16:10:24:
Re the 64 bit work, it works on my machine (i.e., clean build, clean test suite pass) but I could believe the existence of problems on other systems. There are a lot of signed/unsigned hazards in the code (which gcc only flags with extra warnings turned on) but many of those are problems in Tcl 8.* too; working through them all is a huge job.

But before I work on those (a boring boring process) I'd like to know whether what I'm proposing is the right thing to try to do. I guess that requires me to update TIP #115 and get talk going in tcl-coreā€¦

dgp added on 2012-11-29 09:20:58:
I think the answer to that question depends on how
Jan and I resolve our difference about whether the
TCL_STUBS_MAGIC value needs to change on the
8->9 transition.  If it does not change, then the scenario
in the original report can still happen.  Since I'd prefer not
to change the TCL_STUBS_MAGIC value, I'd want to have
a redesigned stubs initialization in Tcl 8.6 for protection
against that scenario.  And yes, it offers much nicer error
message too.

I would not make the change for Tcl 8.5.14, however, because
that would create the situation where extensions compiled against
some 8.5 header files would not be able to link against some 8.5
stubs libraries, and that's just too messy to explain.  Extensions
compiling against 8.5 and older headers will just continue to
suffer from this broken design. The redesign would neatly arrive
with Tcl 8.6.

jenglish added on 2012-11-29 08:06:11:
Is there a need to change the 8.6.* stub mechanism at all?

No matter what else, if a novem tclsh tries to [load] an extension buitl against 8.4.19 or 8.5.13, or conversely if an 8.4.19 or 8.5.13 tclsh tries to [load] a novem-compatible extension, it must fail gracefully.  "Fails gracefully with a slightly better error message" might be a worthwhile goal for 8.6.0, but I don't think it justifies a whole lot of effort or additional complexity.

dgp added on 2012-11-29 06:03:39:
On other matters, Jan, the value of checking TCL_STUBS_MAGIC
is dawning on me.  I'll comment after updating the bug-3588687 branch.

dgp added on 2012-11-29 05:59:50:
Although the action on novem is a lot freer than on trunk,
it's still quite useful to let novem be a branch with some stability.
Best if checkins there are build-able and don't panic attempting
the test suite, which novem-64bit-sizes did the last I checked.

dgp added on 2012-11-29 05:57:55:
Please not yet.

dkf added on 2012-11-29 05:37:05:
If someone wants to merge novem-64bit-sizes back to novem, I wouldn't object in principle, but we probably need to take some decisions about whether what I've done there is The Right Thing before we do.

That's orthogonal to the stub table stuff or Tcl_PkgRequireEx; they are parts we don't need to touch for 64-bit fixing (if the stub tables ever get so they have as many as 1M entries - which will still fit in an int just fine - we've got *much* bigger problems). After all, 64-bit architectures already have 64-bit pointers; it is their defining feature.

nijtmans added on 2012-11-29 04:21:47:
>I don't see the need to change the TCL_STUBS_MAGIC value
>to make the 8->9 transition
As soon as your work and dkf's is on "novem" I will construct
a test-case which proves that changing TCL_STUBS_MAGIC
is necessary for 64-bit platforms (not for 32-bit platforms)

> I believe that
> the dlopen() or equivalent routine will detect that
dlopen will detect the difference between a 32-bit
and 64-bit platform on the same machine, e.g.
win32/win64. It will not detect whether
Tcl_NewStringObj() has a size_t or an int as last
argument.....

But let's concentrate on the things we agree
on. The change from Tcl_InitStubs to a
macro which calls TclInitStubs which has
additinal aguments, that we agree on.
The desire for an unversioned stub library,
that we agree on.
What checks the TclInitStubs() function
should do can be decided later, after
there is more clarity on the stub changes,
so let's put that aside for now.

dgp added on 2012-11-29 01:04:01:
Ah, thanks.  Something else to clarify.

I'm aiming this patch at bridging the transition to
the novem-remove-string-result branch. I'm not aiming
at the stubs changes that are currently on the tip of
the novem branch.

I don't see the need to change the TCL_STUBS_MAGIC value
to make the 8->9 transition.  The functioning of the
Tcl_PkgRequireEx() routine in the first stubs slot is
enough to manage the version checking.  The constraint
that Tcl 9 has to keep that entry, and keep it first in the
stubs table is a reasonable one, I think.

I also see no need to have a TCL_STUBS_MAGIC value
that depends on sizeof(size_t).  If a [load]ing program and
a [load]ed extension don't agree at that level, I believe that
the dlopen() or equivalent routine will detect that and fail long
before our stubs checking mechanisms are ever called on.

That said, yes I agree that if we keep a Tcl_InitStubs() function
to offer linkability of all libtclstubs8*.a libraries, we should use
a literal value 8 for the argument instead of TCL_MAJOR_VERSION.

nijtmans added on 2012-11-28 15:21:59:
I only have two additional suggestions:
>Suggest simply leaving Tcl_InitStubs out of novem tclStubLib.c. That way
>the extension will get a linker error if there is a mismatch
My suggestion would be to leave Tcl_InitStubs in there, but don't use
TCL_MAJOR_VERSION, but simply hardcoded '8': We already know
that Tcl 9 doesn't use Tcl_InitStubs, and in Tcl 7 it didn't even
exist yet, so we know it must be 8. Then we again have no
reason for versioning libstubs.a

I would pass TCL_STUB_MAGIC as argument as well. 
For Tcl 8 we know it's a fixed number, but then we could
keep the signature for TclInitStubs the same in Tcl8
and Tcl9. Another little bit of more compatibility.

Thanks!

dgp added on 2012-11-28 09:49:05:
The branch/patch is off the trunk, not off novem,
This is a fix proposed to go into libtclstub8.6.a.

When I merge it forward to the novem branch, I
would do just what you suggest, and omit the
Tcl_InitStubs() call completely.  An extension
compiled against 8.* headers will simply refuse to
link to a Tcl 9 stubs library. 

Ok, so the patch would be improved if we just
force that linking error to show up at the 8.5->8.6
transition and start using the runtime error detection
right away.  I shied away from that because you've
observed many times that we version the libstubs.a
file for no apparent reason, and now we would at last
have one.

jenglish added on 2012-11-28 07:52:43:
I don't think that this part does what it claims to do:

90      /*   
91       * Detect whether the extension and the stubs library were built
92       * against Tcl header files from different major versions.  That's
93       * seriously broken.
94       */ 

If an extension was compiled against 8.5.13 headers with -DUSE_TCL_STUBS (where Tcl_InitStubs is declared as a symbol with external linkage) but mistakenly linked against novem libtclstub.a, it'll get novem's definition of Tcl_InitStubs(), which passes novem's TCL_VERSION_MAJOR to the worker routine TclInitStubs().

Suggest simply leaving Tcl_InitStubs out of novem tclStubLib.c.   That way the extension will get a linker error if there is a mismatch (old headers+new libtclstub => Tcl_InitStubs not found; new headers+old libtclstub => TclInitStubs not found).

The change could be backported to core-8-5-branch and core-8-6-branch, but that would just change a link-time error into a runtime one -- probably not worth doing (if there's a mismatch, it's better to detect it earlier).

dgp added on 2012-11-28 04:11:59:
Branch bug-3588687 has a patch to address these
problems.  Would appreciate your comments if you
can spare the time.  I think it's not as radical a change
as you might choose, but I think it fixes the worst
problem identified above.

dkf added on 2012-11-21 03:01:09:
To further concur, the 9.0 series does not need to load any extension compiled against 8.* at all *and* the stub table can be considered to be wholly in a state of flux right now. For sure, there won't be stabilization until *at least* 9.0a1 is released, and probably not until 9.0.0 (with things slushy after 9.0b1).

Now is not the time to consider the impact on extensions. Now is not the time to put in lots of compatibility goop.

andreas_kupries added on 2012-11-21 01:07:56:
Regarding "This means that 8.x extensions cannot be used in a 9.0 interpreter."

IMHO this is a good thing, and should be kept.

8 and 9 are different major versions, and we expect incompatibility at C API level, i.e. stubs tables.

We expect that an extension must be re-compiled against 9.x to pick up on these changes and be loadable in 9.x.

A clean break between the versions of the core.

Putting code into 9.x to shoehorn an extension build against 8.x into it, that is a source of complexity in runtime checks and/or ifdeffery we can, should!, do without, at this point.

We are currently in _compatibility breaking_ mode, where we are not even have fully worked out what we are breaking and how. But we know that we want to be free of constraints in our thinking, and 8.x compat work is such a constraint. IOW trying to re-create/shoehorn compatibility now is IMHO much to early in the dev cycle for 9.x. Given the moving target that is 9.x it could possibly be called a 'fool's errand' also (with apologies if that is considered insulting).

nijtmans added on 2012-11-20 15:43:55:
Please consider:

<http://core.tcl.tk/tcl/info/16299cb2af>

I think 2 of the 3 points you raised are solved now
in the "novem" branch.

Feedback welcome!

nijtmans added on 2012-11-20 15:10:19:
Thanks, Don, for this clear description.

Regarding the second point, the TCL_STUBS_MAGIC value built-in
the stubs library, that's completely right. I already did work on
this in the "novem" branch which IMHO corrects that, but feedback
is welcome.

>What's important is that the table in iPtr->stubTable has a compatible
>layout to the one that was compiled into the extension. But the
>mechanism in place cannot check this
Yes, that's a problem. So, let's put a mechanism in place to check this.

Your last point, the version argument, the problem now is that
Tcl_InitStubs(interp, "9.0", 0) returns false when called in a "8.6"
interpreter. This means that 8.x extensions cannot be used
in a 9.0 interpreter. I don't have an immediate solution for that.
Easiest is to let extesions use Tcl_InitStubs(interp, "8.6+", 0)
if they want to indicate that they can run under both 8.6
and 9.0, but that could - at best - only work on 32-bit platforms.
64-bit platforms should have a different MAGIC value,
given the big reform that is being worked on now by dkf.

Regards,
        Jan Nijtmans