Tcl Source Code

View Ticket
Login
Ticket UUID: 2023112
Title: oo-18.3 fails with -singleproc 1
Type: Bug Version: obsolete: 8.6a1
Submitter: das Created on: 2008-07-21 00:13:35
Subsystem: 35. TclOO Package Assigned To: dkf
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2012-05-20 16:22:09
Resolution: Fixed Closed By: dkf
    Closed on: 2012-05-20 09:22:09
Description:
oo-18.3 fails when running the testsuite with -singleproc 1.

I've tracked this down to a conflict between the tests NRE-oo.4, namespace-42.8 and oo-18.3, all three are needed to get the failure...

i.e. running the testsuite with the following options gives the output below:
-singleproc 1 -verbose "bet" -file "NRE.test namespace.test oo.test" \
-match "NRE-oo.4 namespace-42.8 oo-18.3"


Tests running in interp:  build/tcltest
Tests located in:  tcl/tests
Tests running in:  build
Temporary files stored in build
Test files sourced into current interpreter
Running tests that match:  NRE-oo.4 namespace-42.8 oo-18.3
Skipping test files that match:  l.*.test
Only running test files that match:  NRE.test namespace.test oo.test
Tests began at Mon Jul 21 01:57:46 CEST 2008
NRE.test
---- NRE-oo.4 start
namespace.test
---- namespace-42.8 start
oo.test
---- oo-18.3 start


==== oo-18.3 OO: define command support FAILED
==== Contents of test case:

    list [catch {oo::class create foo {error bar}} msg] $msg $errorInfo

---- Result was:
1 bar {bar
    while executing
"error bar"
    (in definition script for object "::foo" line 1)
    invoked from within
"return -options $opt $msg"
    (class "::oo::class" constructor line 6)
    invoked from within
"oo::class create foo {error bar}"}
---- Result should have been (exact matching):
1 bar {bar
    while executing
"error bar"
    (in definition script for object "::foo" line 1)
    invoked from within
"oo::class create foo {error bar}"}
==== oo-18.3 FAILED


Tests ended at Mon Jul 21 01:57:47 CEST 2008
all.tcl:Total419Passed2Skipped416Failed1
Sourced 3 Test Files.
Files with failing tests: oo.test
User Comments: dkf added on 2012-05-20 16:22:09:

allow_comments - 1

Hack level reduced. No more DEADBEEF or UpCatch...

dkf added on 2012-05-18 06:54:17:
Remaining task: reduce "hack factor" in class constructors

dkf added on 2009-02-22 23:21:59:
Note that issue 2 could also be resolved through the use of a suitably-coded C version of the class constructor.

dgp added on 2008-08-19 05:32:38:
Logged In: YES 
user_id=80530
Originator: NO


Issue 1) got re-reported as Tcl
Bug 2037727, and has been fixed
in the HEAD, but the fix could
use some review.  See 2037727
for more on that topic.

Issue 2) is the remaining issue
here.  I made one quick attempt
to reduce the hack-factor, and
failed.  I'm leaving this report
open in hopes that can still be
accomplished, but it's less of
a priority now.

dkf added on 2008-08-19 05:27:25:
Logged In: YES 
user_id=79902
Originator: NO

1) would seem to indicate that there is a mistake in the copying of the epoch change detection code from the code to compile a procedure body to the code to compile a method body. That's a pure bug; my intention was that they should detect changes identically.

2) is a hack. Non-hack replacements that are valid even with arbitrary traces are welcome. (I don't like using string processing for this because I worry about newlines. I suppose I could instead have written the constructor using a C-implemented method; the APIs for that are now much stronger than they were.)

dgp added on 2008-08-04 23:22:42:
Logged In: YES 
user_id=80530
Originator: NO

see 2037727.

dgp added on 2008-08-04 11:35:54:
Logged In: YES 
user_id=80530
Originator: NO

see 2037338

dgp added on 2008-08-03 11:54:14:

File Added - 286867: 2023112.patch

Logged In: YES 
user_id=80530
Originator: NO


The original prescription no longer
reveals a bug, since NRE.test was
renamed nre.test, and even sticking
in the new names has the three tests
in a different sequence that shows no
test failure.

Attached patch adds a new test
oo-18.6 to replace the lucky trio.

Attached patch also includes a fix
for one of the bugs contributing to
the test failure.  When an in-progress
TEBC detects a bump of the compile
epoch it calls on Tcl_EvalEx to
evaluate (possibly redefined) commands
rather than continue executing invalid
bytecode.  Tcl_EvalEx does its own
logging of errors to errorInfo.  When
the Tcl_EvalEx returns though, TEBC
still routes exceptions through its
own handling machinery which here
attempt to log to errorInfo a second
time, and also replace a carefully
crafted errorline value.  Fixing that
is sufficient to make the test pass.

There are other bugs discovered on
this adventure.

1) This scenario should never get into
the mid-execution invalid bytecode
situation to begin with.  The bytecode
is known invalid before execution begins,
but OO method body dispatch is currently
failing to make the epoch check.

2) The trickery with a magic errorline
value truly is an ugly hack, and at least
to my eyes should be easy enough to
replace with something cleaner and more
robust.

I'll try to craft additional tests to
reveal these other bugs independently.
Once I get those filed, I'll be able
to close this one.

File Added: 2023112.patch

dgp added on 2008-08-02 04:05:28:
Logged In: YES 
user_id=80530
Originator: NO


In the buggy scenario,
Tcl_LogCommandInfo is getting
called between the time the
magic errorLine is set, and
the time it is checked.

dgp added on 2008-08-02 03:18:19:
Logged In: YES 
user_id=80530
Originator: NO


Somehow the "Horrible hack" at
line 930 of tclOOMethod.c is
being disrupted.  Indicates
the value of iPtr->errorLine
is being overwritten when not
expected.

dgp added on 2008-08-02 02:40:33:
Logged In: YES 
user_id=80530
Originator: NO


Confirming that theory, replace

    proc demo args {}; rename demo {}

with

    testcmdtrace tracetest {}

and see the same failure.  What those
have in common is a bump of the
compileEpoch.

dgp added on 2008-08-02 02:33:53:
Logged In: YES 
user_id=80530
Originator: NO


Oops.  More likely is that the

    proc demo args {}; rename demo {}

sequence has the effect of bumping
the compileEpoch.

dgp added on 2008-08-02 02:28:54:
Logged In: YES 
user_id=80530
Originator: NO


Comment out line 343 of tclProc.c,

        ((Command *) cmd)->compileProc = TclCompileNoOp;

and the test failure goes away (as well
as no-op proc optimization).

Suggests ugly possibility of memory
corruption at the root of the problem?

dgp added on 2008-08-02 01:35:02:
Logged In: YES 
user_id=80530
Originator: NO


The proc name "demo" is irrelevant,
but the argument spec and body matter.

There must be some interaction with
the special "compile to no instructions"
procs.

dgp added on 2008-08-02 01:23:18:
Logged In: YES 
user_id=80530
Originator: NO


So this interactive session
demo's the bug, which appears
to be within TclOO:

% oo::class create foo {error bar}
bar
% set errorInfo
bar
    while executing
"error bar"
    (in definition script for object "::foo" line 1)
    invoked from within
"oo::class create foo {error bar}"
%
%
% oo::class create foo
::foo
% foo destroy
% proc demo args {}
% rename demo {}
% oo::class create foo {error bar}
bar
% set errorInfo
bar
    while executing
"error bar"
    (in definition script for object "::foo" line 1)
    invoked from within
"return -options $opt $msg"
    (class "::oo::class" constructor line 6)
    invoked from within
"oo::class create foo {error bar}"

dgp added on 2008-08-02 01:18:09:
Logged In: YES 
user_id=80530
Originator: NO


Probing reveals that the role
of namespace-42.8 is just the
command sequence:

    proc demo args {}
    rename demo {}

And the role of NRE-oo.4 is
just the command sequence

    oo::class create foo
    foo destroy

msofer added on 2008-07-30 04:58:02:
Logged In: YES 
user_id=148712
Originator: NO

Confirm reproducible, concur hard to explain. 

Relieved that it exists pre-nre, so that that's not where to look. Specifically, it exists since the very first commit of TclOO to the Tcl tree on 2008-05-31 (on my machine setting 'ulimit -s unlimited' allows the NRE tests to run to completion).

dkf added on 2008-07-21 15:13:53:
Logged In: YES 
user_id=79902
Originator: NO

Huh!? That's bizarre! (I confirm that this is reproducible.)

OK, initial suspicion is something to do with representation caching and literals that is ending up allowing the error to escape the [catch] in the class constructor script. Handing off to an expert on such things for a second opinion. (I've no idea what to do about this!)

Attachments: