Ticket UUID: | 2023112 | |||
Title: | oo-18.3 fails with -singleproc 1 | |||
Type: | Bug | Version: | obsolete: 8.6a1 | |
Submitter: | das | Created on: | 2008-07-21 00:13:35 | |
Subsystem: | 35. TclOO Package | Assigned To: | dkf | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2012-05-20 16:22:09 | |
Resolution: | Fixed | Closed By: | dkf | |
Closed on: | 2012-05-20 09:22:09 | |||
Description: |
oo-18.3 fails when running the testsuite with -singleproc 1. I've tracked this down to a conflict between the tests NRE-oo.4, namespace-42.8 and oo-18.3, all three are needed to get the failure... i.e. running the testsuite with the following options gives the output below: -singleproc 1 -verbose "bet" -file "NRE.test namespace.test oo.test" \ -match "NRE-oo.4 namespace-42.8 oo-18.3" Tests running in interp: build/tcltest Tests located in: tcl/tests Tests running in: build Temporary files stored in build Test files sourced into current interpreter Running tests that match: NRE-oo.4 namespace-42.8 oo-18.3 Skipping test files that match: l.*.test Only running test files that match: NRE.test namespace.test oo.test Tests began at Mon Jul 21 01:57:46 CEST 2008 NRE.test ---- NRE-oo.4 start namespace.test ---- namespace-42.8 start oo.test ---- oo-18.3 start ==== oo-18.3 OO: define command support FAILED ==== Contents of test case: list [catch {oo::class create foo {error bar}} msg] $msg $errorInfo ---- Result was: 1 bar {bar while executing "error bar" (in definition script for object "::foo" line 1) invoked from within "return -options $opt $msg" (class "::oo::class" constructor line 6) invoked from within "oo::class create foo {error bar}"} ---- Result should have been (exact matching): 1 bar {bar while executing "error bar" (in definition script for object "::foo" line 1) invoked from within "oo::class create foo {error bar}"} ==== oo-18.3 FAILED Tests ended at Mon Jul 21 01:57:47 CEST 2008 all.tcl:Total419Passed2Skipped416Failed1 Sourced 3 Test Files. Files with failing tests: oo.test | |||
User Comments: |
dkf added on 2012-05-20 16:22:09:
allow_comments - 1 Hack level reduced. No more DEADBEEF or UpCatch... dkf added on 2012-05-18 06:54:17: Remaining task: reduce "hack factor" in class constructors dkf added on 2009-02-22 23:21:59: Note that issue 2 could also be resolved through the use of a suitably-coded C version of the class constructor. dgp added on 2008-08-19 05:32:38: Logged In: YES user_id=80530 Originator: NO Issue 1) got re-reported as Tcl Bug 2037727, and has been fixed in the HEAD, but the fix could use some review. See 2037727 for more on that topic. Issue 2) is the remaining issue here. I made one quick attempt to reduce the hack-factor, and failed. I'm leaving this report open in hopes that can still be accomplished, but it's less of a priority now. dkf added on 2008-08-19 05:27:25: Logged In: YES user_id=79902 Originator: NO 1) would seem to indicate that there is a mistake in the copying of the epoch change detection code from the code to compile a procedure body to the code to compile a method body. That's a pure bug; my intention was that they should detect changes identically. 2) is a hack. Non-hack replacements that are valid even with arbitrary traces are welcome. (I don't like using string processing for this because I worry about newlines. I suppose I could instead have written the constructor using a C-implemented method; the APIs for that are now much stronger than they were.) dgp added on 2008-08-04 23:22:42: Logged In: YES user_id=80530 Originator: NO see 2037727. dgp added on 2008-08-04 11:35:54: Logged In: YES user_id=80530 Originator: NO see 2037338 dgp added on 2008-08-03 11:54:14: File Added - 286867: 2023112.patch Logged In: YES user_id=80530 Originator: NO The original prescription no longer reveals a bug, since NRE.test was renamed nre.test, and even sticking in the new names has the three tests in a different sequence that shows no test failure. Attached patch adds a new test oo-18.6 to replace the lucky trio. Attached patch also includes a fix for one of the bugs contributing to the test failure. When an in-progress TEBC detects a bump of the compile epoch it calls on Tcl_EvalEx to evaluate (possibly redefined) commands rather than continue executing invalid bytecode. Tcl_EvalEx does its own logging of errors to errorInfo. When the Tcl_EvalEx returns though, TEBC still routes exceptions through its own handling machinery which here attempt to log to errorInfo a second time, and also replace a carefully crafted errorline value. Fixing that is sufficient to make the test pass. There are other bugs discovered on this adventure. 1) This scenario should never get into the mid-execution invalid bytecode situation to begin with. The bytecode is known invalid before execution begins, but OO method body dispatch is currently failing to make the epoch check. 2) The trickery with a magic errorline value truly is an ugly hack, and at least to my eyes should be easy enough to replace with something cleaner and more robust. I'll try to craft additional tests to reveal these other bugs independently. Once I get those filed, I'll be able to close this one. File Added: 2023112.patch dgp added on 2008-08-02 04:05:28: Logged In: YES user_id=80530 Originator: NO In the buggy scenario, Tcl_LogCommandInfo is getting called between the time the magic errorLine is set, and the time it is checked. dgp added on 2008-08-02 03:18:19: Logged In: YES user_id=80530 Originator: NO Somehow the "Horrible hack" at line 930 of tclOOMethod.c is being disrupted. Indicates the value of iPtr->errorLine is being overwritten when not expected. dgp added on 2008-08-02 02:40:33: Logged In: YES user_id=80530 Originator: NO Confirming that theory, replace proc demo args {}; rename demo {} with testcmdtrace tracetest {} and see the same failure. What those have in common is a bump of the compileEpoch. dgp added on 2008-08-02 02:33:53: Logged In: YES user_id=80530 Originator: NO Oops. More likely is that the proc demo args {}; rename demo {} sequence has the effect of bumping the compileEpoch. dgp added on 2008-08-02 02:28:54: Logged In: YES user_id=80530 Originator: NO Comment out line 343 of tclProc.c, ((Command *) cmd)->compileProc = TclCompileNoOp; and the test failure goes away (as well as no-op proc optimization). Suggests ugly possibility of memory corruption at the root of the problem? dgp added on 2008-08-02 01:35:02: Logged In: YES user_id=80530 Originator: NO The proc name "demo" is irrelevant, but the argument spec and body matter. There must be some interaction with the special "compile to no instructions" procs. dgp added on 2008-08-02 01:23:18: Logged In: YES user_id=80530 Originator: NO So this interactive session demo's the bug, which appears to be within TclOO: % oo::class create foo {error bar} bar % set errorInfo bar while executing "error bar" (in definition script for object "::foo" line 1) invoked from within "oo::class create foo {error bar}" % % % oo::class create foo ::foo % foo destroy % proc demo args {} % rename demo {} % oo::class create foo {error bar} bar % set errorInfo bar while executing "error bar" (in definition script for object "::foo" line 1) invoked from within "return -options $opt $msg" (class "::oo::class" constructor line 6) invoked from within "oo::class create foo {error bar}" dgp added on 2008-08-02 01:18:09: Logged In: YES user_id=80530 Originator: NO Probing reveals that the role of namespace-42.8 is just the command sequence: proc demo args {} rename demo {} And the role of NRE-oo.4 is just the command sequence oo::class create foo foo destroy msofer added on 2008-07-30 04:58:02: Logged In: YES user_id=148712 Originator: NO Confirm reproducible, concur hard to explain. Relieved that it exists pre-nre, so that that's not where to look. Specifically, it exists since the very first commit of TclOO to the Tcl tree on 2008-05-31 (on my machine setting 'ulimit -s unlimited' allows the NRE tests to run to completion). dkf added on 2008-07-21 15:13:53: Logged In: YES user_id=79902 Originator: NO Huh!? That's bizarre! (I confirm that this is reproducible.) OK, initial suspicion is something to do with representation caching and literals that is ending up allowing the error to escape the [catch] in the class constructor script. Handing off to an expert on such things for a second opinion. (I've no idea what to do about this!) |
Attachments:
- 2023112.patch [download] added by dgp on 2008-08-03 11:54:14. [details]