Tcl Source Code

View Ticket
Login
Ticket UUID: 21000b5d407f330efaf8e6207fe8d025b0633ebb
Title: Problem with Tcl_DeleteInterp() in 8.4.9.1_Activetcl
Type: Bug Version: 8.4.9.1
Submitter: sakthivp Created on: 2014-02-17 09:31:05
Subsystem: 20. [interp] Assigned To: dgp
Priority: 7 High Severity: Critical
Status: Closed Last Modified: 2015-05-11 17:54:15
Resolution: Out of Date Closed By: dgp
    Closed on: 2015-05-11 17:54:15
Description:
I have some problem with Tcl_DeleteInterp() API (on solaris). I have used this Tcl_DeleteInterp() in my application to delete interpreter. The strange thing i found is the process which uses this API hangs about 20 minutes. I attached the process with truss command and found that the process continuously executing the below functions for almost 20 min for almost 13000(~) times.

15.6541  0.0002 -> libtcl8.4:Tcl_DeleteInterp(0x41748, 0x0, 0x0, 0x0)
15.6580  0.0039   -> libc:free(0x288f8, 0x0, 0x0, 0x0)
15.6583  0.0003   -> libc:mutex_unlock(0xff0424e8, 0x0, 0x0, 0x0)
15.6584  0.0001   <- libc:free() = 0
15.6588  0.0004   -> libc:free(0x474b0, 0x0, 0x0, 0x0)*
...
...
I have also attached the process with truss command when creating the Tcl Interpreter, but there were only about 800 times malloc() function has been called from Tcl_CreateInterp().

Can someone help me on this why the function Tcl_DeleteInterp calling these functions? I have also checked the implementation done for Tcl_DeleteInterp() , but didn't find any clue.

Thanks
User Comments: sakthivp added on 2014-02-26 04:40:16:
Thanks for your reply.
Before pasting the program i edited some part of it as the original program had few lines of commented code to enable memory command.
Did you get the truss command output of the sample program ?
Of course with this code i could not see the hang,It would be really helpful for me to understand why the Tcl_DeleteInterp() internally calls many free() functions.

Though my sample code didn't have any cached string, cashed objects (hope i am correct ), what was the reason behind calling free() and mutex_unlock() functions?

If the same code is compiled with Tcl8.6 libraries I couldn't see the free() and mutex_unlock() function calls. So my question is would the free() and mutex_unlock() function calls are due to the implementation done in tcl ?

dgp added on 2014-02-25 13:43:23:
I found a Solaris system.

$ uname -a
SunOS <<REDACTED>> 5.10 Generic_148888-05 sun4u sparc SUNW,Sun-Blade-1000

On it I built and installed a thread-enabled Tcl 8.4.9 from sources.
I used the "Forte Developer 7 C 5.4 2002/03/09" compiler.

The small program you submitted doesn't produce a working executable
of course.  Here's the modified source I used:

$ cat hang.c
#include <tcl.h>
int main()
{
 int x=0;
 Tcl_Interp* a;
 a =  Tcl_CreateInterp();
 if (a == NULL)
 printf("Interpreter creation Failed \n");
 Tcl_Init(a);
 Tcl_Eval(a, "puts \"Tcl [info patchlevel] thread:$tcl_platform(threaded)\"");
 Tcl_DeleteInterp(a);
 printf("DeleteInterpreter \n");
}

$ cc -I/path/to/installed/Tcl/include -L/path/to/installed/Tcl/lib -ltcl8.4 hang.c

$ a.out
Tcl 8.4.9 thread:1
DeleteInterpreter

It does not hang.  It returns quickly, with no delay that I can
perceive.

In short, I cannot reproduce the problem you report from the
description you've offered.  dkf may have different thoughts,
but I'm not interested in seeing logs, I'm interested in the
ability to demonstrate the problem.

If I cannot demonstrate the problem, I see no hope that I can
offer any help.

dgp added on 2014-02-25 13:05:07:
If you are happier with the results when you use your
program with Tcl 8.6, then that's just another reason
you ought to be using it, and giving up using a non-updated
patchlevel of an End Of Life release series.

sakthivp added on 2014-02-25 09:09:16:
I have compiled the tcl program with Tcl8.6 library, i see only the pthread_mutex_unlock and pthread_XXXX function calls. With Tcl8.6 library i didn't see any free() function call while deleting the Tcl Interpreter.
FYI I compiled and ran this program on the same server. I did not do any configuration changes on this server. 
Does it proves the problem is with Tcl?

sakthivp added on 2014-02-25 07:15:36:
I wrote a small program , which just create the Tcl interpreter and delete it.
-------------------------------------------------------------
int main()
{
 int x=0;
 Tcl_Interp* a;
 a =  Tcl_CreateInterp();
 Tcl_Init(a);
 if (a == NULL)
 printf("Interpreter creation Failed \n");
 delete_interpr(a);
 printf("DeleteInterpreter \n");
}
--------------------------------------------------------------
I compiled this program using Tcl8.4.9.1 libraries and executed it with truss command. Still there are about 577 free() function calls i can see from the Tcl_DelteInterp() function.
Attached the log file. I dont see any cashed object or string in the given program but still could see many free() function calls including mutem_unlock function call.
If it is really something relate to the Server configuration (tuning). Please help me on the specify configuration  parameter.
I have attached the log files for this program as well.

dgp added on 2014-02-24 18:18:16:
So, do I understand that there's no actual hang
demonstrated (infinite loop), but just an unwelcome
long time spent doing a large amount of work that
would be better done by something else, or in some
other way?

Does moving to Tcl 8.5 or 8.6 improve matters?

dkf added on 2014-02-23 18:16:08:

I've looked through the log and I don't see anything particularly problematic; there's simply a lot of things being freed. Since an interpreter is the major context object in Tcl, it's not surprising that deleting it will cause a great many things to cease to be. Many of these things are allocated during the running of scripts (e.g., cached objects, cached strings, cached compilations) so we would not in general expect that the amount of work done in the two cases would be the same. It looks like there's only actually 6540 calls to free in there; plenty, but not really very excessive.

Looking through the trace, some of the operations appear to be taking a long time, but that's probably just the OS preempting to allow some other thread or process to execute. (I'm guessing that mutex lock acquisition is done in a macro inside libc, but that's nothing to do with Tcl.) I don't see any double-frees; well, not without a malloc in-between (I can't tell why we're allocating during the release of the interpreter, but it does seem to be a very small number). I'm also puzzled as to why so many of Tcl's internal calls are not showing up in the trace; I'm guessing inadequacy in the tooling.

Maybe the OS is pulling pages back in from a stressed disk (“swapping”)? That can have mysterious symptoms if you're not expecting it. If that's the case (I don't know how to diagnose for sure; back when I used Solaris, I just put my ear to the machine and listened to the disk head working very hard! Low tech, I know!) there's absolutely nothing we can do about it from a Tcl perspective, as all that software can ever do is assume that it is adequately provisioned. Tuning how much physical memory to allow a system to use for a particular set of tasks is a bit of a black art, but over-allocating is typically cheaper than the developer effort to create a better estimate. The amount of space required appears to be at least 4MB (on the basis of the addresses in the log) and is probably much more than that.

General Tips

In general, we don't advise building Tcl with TCL_MEM_DEBUG enabled as that's much slower. Tcl has its own built-in layer on top of the system memory allocator that avoids most of the locking (making things much faster) but it is disabled when doing mem-debugging as it confuses the heck out of some third-party tools. We also do not advise deallocating on exit (known by us as “finalization”) as the cheapest way of cleaning up the memory held by the process is to just let the OS do it for us by throwing the process away directly. This is very different to how most C++ code works, but we've measured this and know it is definitely true. We have separate hooks to handle deletion of things that need special treatment on exit (e.g., DB connection handles) but they're really rather rare.

In short, I advise trying to avoid deleting the interpreter unless you want to throw away the context and yet keep the process going.

In short, there's nothing obviously wrong in Tcl as far as I can see. There probably is something horribly wrong elsewhere. My current guess is that there's too much instrumentation (yes, it has a major impact!), insufficient hardware resources, or that the hardware is is undergoing some kind of nasty slow failure. I really hope for your sake that it isn't the last of those options, but I've had them happen to me in the past.


sakthivp added on 2014-02-19 05:29:42:
Thank you so much for your support

sakthivp added on 2014-02-19 05:28:53:
I have attached the output of truss command (edited head and tail part of the trace to comply with size). Please have a look at the function call that goes into Tcl_DeleteInterp() which in turn calls multiple number of mutex_unlock() and free() functions.

Please let me know that the problem is in Tcl library or not. I have also used the purify tool to check "memory leak" and "memory in use" problems. But purify didn't report any on my application and in Tcl as well. 

Looks this is normal behavior from Tcl. Please confirm me the same.

dgp added on 2014-02-18 16:20:02:
I don't see any way to make progress diagnosing this
problem without code that demonstrates the hang
against a current release of Tcl.

dkf added on 2014-02-18 13:55:29:

You don't need to assign it to anyone, but we're not yet sure what bug is present that we can fix. Until we get closer to seeing what the failure is, we're going to find it hard to take concrete actions or who might do so (i.e., who to assign it to).

This isn't commercial application support, you understand; we do other things too.


I'm guessing that the problem with the TCL_MEM_DEBUG flag is that you're linking against the wrong built instance of the library. I don't know/remember how to fix that sub-issue.


sakthivp added on 2014-02-18 09:18:59:
Please correct me am i used this page correctly or not( do i have to assign to someone or it will be done from TCL developers ?

sakthivp added on 2014-02-18 09:16:29:
Yes i used purify (purify_all_in_use()(name could be wrong) in my application and checked the memory blocks that were allocated from Tcl_CreateInterp() function. Compare to the free() and mutex_unlock() function calls ( from Tcl_DeleteInterp() truss output), the malloc() function call were very less in Tcl_CreateInterp() function.

I have also used TCL_MEM_DEBUG flag (enabled the mem flag), but not sure of how to use the debug symbol enabled libraries. I have used 'memory active <file>' command on my application to see the active memory allocations. But my application is failed to recognize the memory command.

dkf added on 2014-02-18 07:14:04:

Sounds like memory corruption. That causes all sorts of things to go wrong.

Have you tried doing a build (from the Tcl sources, instead of using someone else's build) with memory debugging turned on (--enable-symbols=mem option during configure)? Or have you tried using a tool like purify/efence (whichever you've got) to spot where things are going haywire?


sakthivp added on 2014-02-17 09:35:30:
Then i wrote a simple program and used both Tcl_CreateInterp() and Tcl_DeleteInterp(). When i attached the program execution with truss command on solaris environment i see there were inconsistency in the number of malloc() function calls(during Tcl_createInterp()) and the number of free() and mutex_unlock() function calls (during Tcl_DeleteInterp()).

Attachments: