Tcl Source Code

View Ticket
Login
Ticket UUID: 3475569
Title: Unexpected behavior / Panic / Tcl_Obj deletion management
Type: Bug Version: obsolete: 8.5.11
Submitter: sebres Created on: 2012-01-18 16:37:33
Subsystem: 10. Objects Assigned To: dgp
Priority: 8 Severity:
Status: Closed Last Modified: 2012-01-26 23:54:49
Resolution: Fixed Closed By: dgp
    Closed on: 2012-01-26 16:54:49
Description:
Platform: ANY;
Area: Tcl core, DeletePending;

I have an unexpected behavior, that can be reproduced only if an Tcl_Obj will be destroyed (in a freeIntRepProc of this) - also ObjDeletePending(contextPtr) is true. 
By call of TclFSNormalizeAbsolutePath through "link = Tcl_FSLink(retVal, NULL, 0);", the retVal will be shared obj (refCount = 2), hereafter by call "Tcl_SetObjLength(retVal, ...);" a PANIC ("Tcl_SetObjLength called with shared object"), hereafter folowing Exception, access vio etc.
The problem is Tcl_FSJoinToPath(pathPtr,...), that incr refCount of object "pathPtr" after that called. I have attached a patch, that i created to work around problem.

In addition the simple code, that will be correctly executed "normally" and perfect unexpected panic in "DeletePending" mode:
<code>
 Tcl_Obj * obj;
 Tcl_Obj * list;
 Tcl_Obj * ret;
  // brand new obj 
  obj = Tcl_NewListObj(0, NULL);
  Tcl_IncrRefCount(obj);
  // brand new parent 
  list = Tcl_NewListObj(1, &obj);
  Tcl_IncrRefCount(list);
  // ... here anything else ... //
  // delete parent ...
  Tcl_DecrRefCount(list);
  // here obj->refCount should be 1 (and normally also 1), but in "DeletePending" mode it is 2 ...
  // also here BOOM if DeletePending :
  Tcl_SetObjLength(obj, 0);
  // ... here anything else ... //
  // delete object
  Tcl_DecrRefCount(obj);
</code>

In my opinion, the DeletePending mechanism must be changed, for example using "Epoch" in Obj struct (the objects created in DeletePending mode could be deleted directly without PushObjToDelete) ... 
Or completelly removed from core.
On every case, all tcl tests should be extended to check it in DeletePending mode.
User Comments: dgp added on 2012-01-26 23:54:49:

allow_comments - 1

fixed for 8.5.12 and 8.6b3

dgp added on 2012-01-26 22:43:21:
ok, the stringObj-10.1 test is just generally not
reliable.  It's fixed on the trunk, and the fix likely 
ought to be backported.

sebres added on 2012-01-26 05:15:20:
you way is more aggressive :)
with a memory leak is understandable (nothing will be deleted, never).
stringObj test kan fail, because literal works very different as always - so any literal Obj could be a none (instead of string). It's normal.
But have you seen two my error (from my previous comment)?
Or you have also patched a Tcl_FSJoinToPath before tests?

dgp added on 2012-01-26 04:44:45:
With that patch applied and a --enable-symbols=mem build,
I see all the memory leak tests fails as they should since
this mod make memory leak all over the place.

Other than that, the only failing test is:

==== stringObj-10.1 Tcl_GetRange with all byte-size chars FAILED
==== Contents of test case:

    set x "abcdef"
    list [testobj objtype $x] [set y [string range $x 1 end-1]]  [testobj objtype $x] [testobj objtype $y]

---- Result was:
string bcde string string
---- Result should have been (exact matching):
none bcde string string
==== stringObj-10.1 FAILED

which for now I assume is a bug in the [testObj]
testing command.

dgp added on 2012-01-26 04:36:57:
Here's what I've applied to do the sort of test
you are suggesting:

Index: generic/tclObj.c
==================================================================
--- generic/tclObj.c
+++ generic/tclObj.c
@@ -382,10 +382,13 @@
  */

 void
 TclInitObjSubsystem(void)
 {
+    ObjInitDeletionContext(context);
+    ObjDeletionLock(context);
+
     Tcl_MutexLock(&tableMutex);
     typeTableInitialized = 1;
     Tcl_InitHashTable(&typeTable, TCL_STRING_KEYS);
     Tcl_MutexUnlock(&tableMutex);

@@ -483,10 +486,13 @@
  */

 void
 TclFinalizeObjects(void)
 {
+    ObjInitDeletionContext(context);
+    ObjDeletionUnlock(context);
+
     Tcl_MutexLock(&tableMutex);
     if (typeTableInitialized) {
        Tcl_DeleteHashTable(&typeTable);
        typeTableInitialized = 0;
     }

sebres added on 2012-01-26 03:03:41:
it was the same bug with Tcl_FSJoinToPath ... (i have tested with not fixed 8.5.11).
as i tests this in patched tcl, there were no exceptions.

But i'm not sure, cause many test scripts was called from child shells, "remotely".
Now i rewrite my code, to start "delete pending" immediately in "init.tcl".
I rise again later.

sebres added on 2012-01-26 02:37:11:
i test it now (not yet all tests pass):

>>>> test script ++++
load test; ## to load testDP
package require tcltest; ## to load tcltest
namespace import tcltest::*
configure \
  -singleproc 1 \
  -debug 3 \
  -testdir {D:\Projects\tcl8.5\tests}

set _\1_ [testDP {source {tcl8.5/tests/all.tcl}}]; unset _\1_
#set _\1_ [testDP {source {tcl8.5/tests/cmdAH.test}}]; unset _\1_
<<<< test script ----

First results are not good:
- test 1 BOOM by "cd .."
- test 2 BOOM by "list [file type abc2.link] [file tail [file link abc2.link]]"

>>>> test 1 ++++
cmdAH.test
.........
Running cmdAH-2.2 {
    file delete -force $foodir
    file mkdir $foodir
    cd $foodir
    set result [file tail [pwd]]
    cd ..
    file delete $foodir
    set result
}
--- file delete -force $foodir
--- file mkdir $foodir
--- cd $foodir
--- set result [file tail [pwd]]
--- cd ..
Tcl_SetObjLength called with shared object

>>>> test 2 ++++
Running fCmd-28.15.2 {
    file link abc.link abc.dir
    file copy abc.link abc2.link
    list [file type abc2.link] [file tail [file link abc2.link]]
}
--- file link abc.link abc.dir
--- file copy abc.link abc2.link
--- list [file type abc2.link] [file tail [file link abc2.link]]
Tcl_SetObjLength called with shared object
-----------------

The problem is - at the moment I must be active (comment the code) in every boom (code change).
I try to rewrite tcltests multiprocessed, so each code will be called in separate tclsh.
I rise again...

sebres added on 2012-01-26 01:14:22:
> Do you know of something else that's 
> unsafe regarding surprise sharing of values?
Unfortunately not. But as I said earlier, to be sure, all tcl tests should be executed also under "delete pending" on. for example - with my testDP call.
But it's also not realy sufficient (you've seen):
 >  file normalize "/a/b/c/d" - is safe
but 
 >  file normalize "/a/b/../c/d" - is NOT safe

dgp added on 2012-01-26 00:44:19:
Related bug 3479689.

dgp added on 2012-01-26 00:23:45:
Do you know of something else that's
unsafe regarding surprise sharing of values?
Not hypothetical, but actual code in Tcl that
fails to check for unshared values where it
would need to do so in this circumstance?
I did an audit yesterday looking over all calls
to routines demanding unshared values and
these issues in tclPathObj.c were the only
problems I found.

sebres added on 2012-01-26 00:08:56:
Why not? In my case was called Tcl_GetEncoding(NULL, "cp1252"), that has called Tcl_FS*, etc; But I have already said, it could be also something else, which is also not refCount safe. Or is forbidden to call tcl-core aus freeIntRepProc? 
Can i close tcl channel, destroy tcl mutex, etc?

dgp added on 2012-01-25 23:55:57:
What need do you have to be calling Tcl_FS*() routines
from a freeIntRepProc ?  Usually all that goes in there
is reference decrements and ckfree() calls tearing down
data structure allocations.

sebres added on 2012-01-25 22:54:36:
So far, so good, but I had not used VFS. The sharing comb by delete panding alone (cause Tcl_DecrRefCount of any fresh created list in Tcl_FSLink, has not delete this list - was a pending.)

dgp added on 2012-01-25 22:37:28:
A refactoring of the routines Tcl_FSJoinToPath()
and Tcl_FSJoinPath() might also be helpful.

Currently Tcl_FSJoinToPath() is implemented
as a call to Tcl_FSJoinPath() after wrapping
values into a list to get the interface requirements
met.

It might be better if both these routines were wrappers
and they both called a common routine "JoinPath" that
accepted only an objv, objc pair.  That would avoid the
submitted scenario where shimmer to path causes
unexpected sharing, though it wouldn't change the
argument about needing extra caution about shared
values when any VFS driver might create sharing.

dgp added on 2012-01-25 22:31:38:
The branch bug-3475569 contains a patch that
adds more checking of the unshared status of
path values before doing things that require unshared
status.  The demonstration submitted is one way to
come across this, but since much operations on paths
may involve calls to VFS drivers, it's wise not to assume
that any unshared path value will remain unshared after
passing it into any Tcl routine.

dgp added on 2012-01-24 07:46:22:
If demonstration of the bug requires C code, that's fine,
but then we have to keep in mind that encountering a
panic might not be a bug at all but an indication that the
routines aren't being called properly.  All your examples
so far give us a first impression of things that are at least
unwise, if not forbidden.  

It does look like TFSNAP is on potentially shaky ground.  Thank you
for pointing that out.  I'll keep looking to see what might be
needed.  Whether or not there's an actual bug to fix, we are
aware that refcount management is difficult and needs at least
better API documentation on what callers can assume about
what Tcl routine do with Tcl_Obj they receieve as arguments.

Gotta go put the kids to bed.

sebres added on 2012-01-24 06:36:50:
How could i produce in pure tcl a "delete pending", or create my own tcl obj type?
Today all heuristics cells in my brain have died... :)

I try to build pure C code later, but it's something like this:
in function testDP_FreeIntRep :
>>>>
Tcl_Obj * newObj = Tcl_NewStringObj("/a/b/../c/d", -1);
Tcl_IncrRefCount(newObj);
TclFSNormalizeAbsolutePath(..., newObj, ...);
// BOOM ...
<<<<

or 

>>>>
// but first time only (encoding does not yet loaded, 
// tcl_library should be like "/a/b/../c/d" - also with ".."
Tcl_GetEncoding(NULL, "cp1251");
<<<<

dgp added on 2012-01-24 05:58:51:
Thanks all for the followups, but the one thing that
would be best for me is a Tcl script that demos the
panic.  Doesn't anyone have that?

WIthout that, we can go round and round about
"bug in Tcl" vs. "bug in the calling C code".

sebres added on 2012-01-24 05:52:59:
> sebres: could you provide a sane example? First you were editing an obj
> without checking if it is shared, now you are evaling it from the
> freeIntrepProc ...
Don't thing about this - it's safe (for test)...
Of course you can call pure C from "file normalize" - it's the same.

> First you were editing an obj without checking if it is shared...
I can not more - last try:
How you write this code:

>>>>>>>>>>>>>>>>>>>>>
Tcl_Obj * objPtr = Tcl_NewObj();
Tcl_IncrRefCount(objPtr);
Tcl_AppendToObj(objPtr, "test", -1);
Tcl_AppendToObj(objPtr, "test2", -1);
<<<<<<<<<<<<<<<<<<<<<

or so (I have not yet seen it in tcl core) :

>>>>>>>>>>>>>>>>>>>>>
Tcl_Obj * objPtr = Tcl_NewObj();
Tcl_IncrRefCount(objPtr);
if (Tcl_IsShared(objPtr)) {
TclDecrRefCount(objPtr);
retVal = Tcl_DuplicateObj(objPtr);
TclIncrRefCount(objPtr);
}
Tcl_AppendToObj(objPtr, "test", -1);
if (Tcl_IsShared(objPtr)) {
TclDecrRefCount(objPtr);
retVal = Tcl_DuplicateObj(objPtr);
TclIncrRefCount(objPtr);
}
Tcl_AppendToObj(objPtr, "test2", -1);
if (Tcl_IsShared(objPtr)) {
TclDecrRefCount(objPtr);
retVal = Tcl_DuplicateObj(objPtr);
TclIncrRefCount(objPtr);
}
<<<<<<<<<<<<<<<<<<<<<

How can you be sure that function Tcl_AppendToObj is a refCount safe?

msofer added on 2012-01-24 05:43:43:
sebres: could you provide a sane example? First you were editing an obj without checking if it is shared, now you are evaling it from the freeIntrepProc ...

msofer added on 2012-01-24 05:41:45:
test.cpp uploaded to http://paste.tclers.tk/2589

sebres added on 2012-01-24 05:41:14:
> When you see the problem, is the path name that
> triggers it in the native filesystem, or some VFS?
native, and not windows only - debian also. I have already said - platform independent.

> Mmhhh ... that is a weird thing to be doing: the code calls Tcl_EvalObjEx
> on an object from the object's freeIntRepProc?
This is for your test only - with tcl it ' s easier and more convenient! Don't thing about this. In real environment it can occur accurately by pure C only.

> I can't get the .cpp file to download. SF gives me some
> HTML/Javascript mess instead of anything resembling source
> code. 
Is a sourceforge bug (have sometimes in other projects also): try it again and again, until you have a cpp-file.

dkf added on 2012-01-24 05:38:30:
And I thought those download failures were just me yesterday. Hah!

(Still no insights though; I prefer to not think about the pathname object type's bizarre incestuousness.)

nijtmans added on 2012-01-24 05:32:29:
Nope, the other test.cpp with UNIX line-endings has the same problem ;-(

nijtmans added on 2012-01-24 05:30:32:
>I can't get the .cpp file to download. SF gives me some
>HTML/Javascript mess instead of anything resembling source
>code
I am having the same problem. Clicking on the link gives this
problem, but saving it to a directory first then it works. I suspect
that the CR-LF line endings are somehow causing this, so here
is the same test.cpp but then with UNIX line-endings.

nijtmans added on 2012-01-24 05:28:24:

File Added - 433845: test.cpp

msofer added on 2012-01-24 05:28:08:
Mmhhh ... that is a weird thing to be doing: the code calls Tcl_EvalObjEx on an object from the object's freeIntRepProc?

dgp added on 2012-01-24 05:19:03:
When you see the problem, is the path name that
triggers it in the native filesystem, or some VFS?

dgp added on 2012-01-24 05:18:06:
I can't get the .cpp file to download.  SF gives me some
HTML/Javascript mess instead of anything resembling source
code.  DLLs do nothing for me since I don't develop on windows.

Do you have a Tcl script that demos the problem?
Does the problem only occur on the Windows platform?

sebres added on 2012-01-24 05:14:29:
> The patches seem to indicate that there are paths
> through the routine TclFSNormalizeAbsolutePath()
> that will cause a panic.
Yes, bot IMHO not only (it could impact the whole tcl core by delete pending): see my comment from today...
> Do you have a demo script that causes such a panic? 
Yes, see 2 last attached files (test.cpp or test.dll)

dgp added on 2012-01-24 04:58:55:
The patches seem to indicate that there are paths
through the routine TclFSNormalizeAbsolutePath()
that will cause a panic.  Do you have a demo script
that causes such a panic?

dgp added on 2012-01-24 01:35:23:
If the problem is not in "Path Management" then it is
very confusing that the patches submitted only make
changes in the "Path Management" area.  This report
as a whole is confusing because seems to be alleging
errors in two different sections with two different maintainers.

I'll eventualy (hopefully soon) get back to examining the
patches and see what is wrong with the tclPathObj stuff this time.

If there really is a problem with Tcl_Obj deletion, I suggest opening
another Tracker item dealing with that alone and assign to dkf.

sebres added on 2012-01-23 17:49:24:
spoon-feeding the issue to you:

IMHO it's not a problem of "Path Management", but a problem of "Delete Pending" feature (so i move that again in the category "10. Objects"). 
Although I have found it in calls such "file normalize", but it's possible anywhere, if delete pending on.
And IMHO my attached path was a workaround, the solution would be a rewriting of delete pending functionality:
> the DeletePending mechanism must be changed, for example using "Epoch" in Obj struct 
> (the objects created in DeletePending mode could be deleted directly without PushObjToDelete) ...
> Or completelly removed from core.

I have attached the example C code (with example TCL code within comment bottom) and a ready dll,
with which you could reproduce/test any tcl function with delete pending.

Now spoon-feed the problem:
If you write following, it is legal (i know that refCount = 1): 
  Tcl_Obj * objPtr = Tcl_NewObj();
  Tcl_IncrRefCount(objPtr);
  Tcl_AppendToObj(objPtr, "test", -1);
  Tcl_AppendToObj(objPtr, "test2", -1);
But if tomorrow Tcl_AppendToObj will be modified, so it attaches objPtr to any "list", increments refCount of "list" and hereafter decrements refCount of this "list" again.
This means, that this code will not work properly by "Delete Pending", cause the decrementing refCount of "list" does not destroy it, and refCount of objPtr remains 2.

sebres added on 2012-01-23 17:48:02:

File Added - 433788: test.dll

sebres added on 2012-01-23 17:47:27:

File Added - 433787: test.cpp

dkf added on 2012-01-22 23:43:51:
I can't make head or tail of this, other than that it is almost certainly down to the odd messing around inside the filename/path type that is responsible. Since all the patches are in that area but none of the code snippets describe how to trigger the problem (from Tcl or "obviously correct" C using public APIs) then I can't figure out what is actually wrong or where things might change.

Assigning back to dgp, but you (i.e., sebres) might wish to take some time to come up with some code that we could use to replicate the problem, preferably in the form of a test so we can verify when the problem is fixed.

sebres added on 2012-01-19 06:45:12:
I'm trying to describe it with other words. 
In my own tcl obj type X, i have call in the freeIntRepProc of type X somthing like this: 
   file normalize {D:\test\bin\..\modules\tcl\lib\tcl8.5\encoding\cp1252.enc}. 
Indirectly, cause something like following was called:
   encoding convertto $enc "text here"
This was very hard to find and reproduce, cause error occured only when "$enc" was not yet loaded, and by NOT normalized "tcl_library" path.
(In debugger was normalized path).
The problem was a loading of encoding file, and ultimately, the bug is TclFSNormalizeAbsolutePath / Tcl_FSLink / Tcl_FSJoinToPath.
And of course only if OnDeletePending is true.
See my both attached patches.

dgp added on 2012-01-19 02:50:25:
Passing to dkf in case he's free to look sooner than I am.

I'll pick it up again when I'm free if nobody else gets it done by then.

msofer added on 2012-01-19 01:47:53:
Indeed - ANY in-place modification of a Tcl_Obj requires inquiring about the shared status. 

The problem about "knowing" the refCount is that you don't always know, as in the present case where you knew something that as a matter of fact is not true. Note that your // ... here anything else ... // could contain refCount changes too, like storing the thing in a variable or ... anything else.

dgp: reassigning to you as a request for a second independent opinion on the matter.

sebres added on 2012-01-19 01:30:48:

File Added - 433391: tclPathObj.c.patch

sebres added on 2012-01-19 01:29:27:
please read carefully my example code. i know what shared object is.
the exception resp. panic occured standalone in tcl core also, like call of this in freeIntRepProc (by DeletePending) :
   file normalize {D:\test\bin\..\modules\tcl\lib\tcl8.5\init.tcl}. 
in my example code (if you read carefully) ... before call of Tcl_SetObjLength(obj, 0) the refCount should be "1", and is "1" normally,
but by DeletePending it is "2".
Or you want to tell me - you write before each modification call this :
<code>
if (Tcl_IsShared(retVal)) {
 TclDecrRefCount(retVal);
 retVal = Tcl_DuplicateObj(retVal);
 TclIncrRefCount(retVal);
}
</code>
I do it so, but only by object, which refCount i don't know. In my example refCount of "obj" should be definitely "1".
Otherwise the tcl8.5 code should be also completelly revised.

msofer added on 2012-01-18 23:56:16:
Looks like a bug in your own code, PANIC ("Tcl_SetObjLength called with shared object") is the clue: you should never try to modify a shared object. The proper way is to check using Tcl_IsShared(), and obtain if necessary your own copy with Tcl_DuplicateObj().

Now: if this is being done within Tcl and not by your code, it is definitely Tcl's bug. In that case please provide sample code that triggers the panic.

sebres added on 2012-01-18 23:37:38:

File Added - 433386: tclFileName.c.patch

Attachments: