Tcl Source Code

View Ticket
Login
Ticket UUID: 833d14d19c129802bb7724be5cfde31d7d962a67
Title: Segfault or corrupted double-linked list errors on exit
Type: Bug Version: 8.6.3rc1
Submitter: sbron Created on: 2014-10-26 16:50:45
Subsystem: 20. [interp] Assigned To: dgp
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2014-11-21 15:37:11
Resolution: Fixed Closed By: dgp
    Closed on: 2014-11-21 15:37:11
Description:
I sometimes get different errors when exiting my application, especially after the application has been creating and destroying slave interpreters.

One error is: *** Error in `./tclsh8.6': corrupted double-linked list: 0x00000000028c8a40 ***
In that case the application doesn't actually exit.

Other times I get a segfault. I will attach a backtrace.

I'm not sure if these two errors are different symptoms of the same problem or if there are multiple problems. For the moment I'm assuming one problem because I get these results after the exact same sequence of actions, which may also work without a problem.

Platform: Linux x86_64 3.11.6-4
User Comments: sbron added on 2014-11-19 16:10:32:
Confirmed. The invalid read/write reports from valgrind are gone. The other issues appear to be Mk4tcl related, so I rewrote my code to not use Mk4tcl for that anymore. I'm happy to call this problem resolved.

dgp added on 2014-11-13 14:53:28:
Did I hear correctly that the Tcl/Tk 8.6.3 release
has taken care of this?

dgp added on 2014-11-06 18:35:08:
Patched Tk 8.5 and 8.6 to stop that invalid read.

Hope that cleans things up to make the search easier.

sbron added on 2014-10-31 22:52:46:
As concluded earlier, the valgrind output points to a Tk issue. A fairly minimal script to reproduce the issue is:

package require Tk
. configure -menu [menu .main]
.main insert 0 cascade -label Menu1 -menu [menu .main.m1]
.main insert 1 cascade -label Menu2 -menu [menu .main.m2]
destroy .main.m2
.main delete 1

To be clear: This doesn't segfault, it just produces invalid read/write reports when run in a symbols-enabled tclsh under valgrind.

dgp added on 2014-10-30 12:55:38:
From chat:

"schelte    dgp, 8.6.1 also segfaults for me if I stress it enough. So my report is probably not a new issue and shouldn't be blocking for the release of 8.6.3."

Re-opened ticket at regular priority.  There still something
to fix here.

dgp added on 2014-10-30 11:51:29:
Since we stopped posting comments here, ferrieux may not
be fully aware of the great efforts you were making to
track this down, sbron, which I greatly appreciate.

There were strong reasons to suspect that recent changes
to TclOO and Itcl were implicated in the symptoms, and
chasing down that possibility at least as much as we could,
even if not to full resolution was of great value.

I still want to help, because I want Tcl + its bundled
packages to continue to be useful to you as released.
As I understood the last state of things, though, we
could no longer identify a "last good" combination of
released packages that did not exhibit the problem.
The presence or absence of symptoms was connected to
whether or not your app was wrapped up into a "kit"
format.  Do I have that correct?

The existence of invalid reads in Tk is useful info too.
I will be looking into the feasibilty of something like
a valgrind run to check things out.  I fear what I will find.

Please keep helping us as you can, and we will do likewise
as we can.

sbron added on 2014-10-30 10:38:43:
I thought the purpose of the release candidates was to find issues before the final release. That's why I gave a heads-up as soon as possible. But if you will only accept a fully analyzed problem with a convenient small script to reproduce it, then by all means go ahead and release and I won't spend my time testing release candidates anymore.

ferrieux added on 2014-10-30 10:11:39:
When it is the case that the bug needs a big setup with many extensions, it is a good idea to state it in the initial report. That avoids wasting maintainers' time on wrong hypotheses...

Now given your latest report, the likeliness of a reponsibility of the Tcl core is vastly reduced. Then I suggest to close this ticket as Works For Me, and to post a new one (in Tk, not Tcl) regarding the valgrind error in tkMenu.c. That second report, of course, will include:

   - a clear statement, in the description, that it happens on a vanilla wish ("compiled with -DPURIFY" or any other relevant non-default config items)
   - a minimal reproducing script

sbron added on 2014-10-30 09:51:12:
I didn't forget to attach the full script. The problem is that I only get the errors from a big application that I have been unable to reduce to a smaller version that still reproduces the issue.

The application can run with a regular tclsh, but it does load several binary extensions, such as Tk, Itcl, Mk4tcl, vfs::zip, tktray, and dbus.

I ran the program under valgrind and saw several Invalid read and Invalid write reports. Most came out of Mk4tcl.so, but there were also two from tkMenu.c.

Next, I wrote enough of a Tcl-only replacement for Mk4tcl to take the program through the steps that originally led to the problem. I also stubbed out all calls to the other binary packages. With that version of the program I don't see the errors anymore, but valgrind still reports the invalid read/write from tkMenu.c.

I am now not sure anymore if there is a problem in Tk that only reveals itself under very obscure circumstances, or if Mk4tcl is really to blame. It is strange though that even reduced to a pure Tcl/Tk application, valgrind still reports issues.

msofer added on 2014-10-26 19:16:47:
Also please
a) tell us if there is any C-extension, or if this is pure script
b) compile Tcl with -DPURIFY and --enable-symbols, run under valgrind and attach the valgrind report: backtraces are usually not that informative in cases of mem corruption, as they pinpoint not where the corruption was caused but rather where it has a disastrous effect (which may or may not be its first effect).

dgp added on 2014-10-26 19:13:18:
Any chance that valgrind or something like it might
point to where the memory corruption takes place?

ferrieux added on 2014-10-26 17:17:58:
You forgot to attach the full script, and to specify whether this is a vanilla or extended tclsh/wish.

Prior to posting the script, ou might also want to reduce it to the minimal reproducing setup.

Attachments: