Tcl Source Code

View Ticket
Login
Ticket UUID: 1034337
Title: recursive file/dir delete broken on OSX
Type: Bug Version: obsolete: 8.4.11
Submitter: nobody Created on: 2004-09-24 21:18:35
Subsystem: 37. File System Assigned To: hobbs
Priority: 8 Severity:
Status: Closed Last Modified: 2005-12-05 15:32:37
Resolution: Fixed Closed By: hobbs
    Closed on: 2005-10-07 22:25:07
Description:
Submitted by: Sue LoVerso, [email protected]

Recursive, forced delete of a directory fails on OSX.
TraverseUnixTree should be fixed in tclUnixFCmd.c (or a
macosx-specific version written).  Apple has this to say:

http://docs.info.apple.com/article.html?artnum=107884

There may be other places using that same construct for
readdir, I didn't look for them.  TraverseUnixTree is
the one
affecting me.  This simple script illustrates the problem:

proc x { } {
        set nfiles 250
        puts "Creating $nfiles files"
        file delete -force ./TESTDIR
        file mkdir ./TESTDIR
        for { set i 0 } { $i < $nfiles } { incr i } {
                file copy ./x.tcl ./TESTDIR/x.tcl.$i
        }

        puts "Deleting $nfiles in TESTDIR"
        set stat [catch {file delete -force ./TESTDIR} ret]
        if { $stat != 0 } {
                puts "Error: stat $stat, ret $ret"
                set filelist [exec ls ./TESTDIR]
                puts "FILES: $filelist"
        } else {
                puts "Delete successful"
        }
}

Multiple calls to 'file delete -force ./TESTDIR' will
eventually work.
User Comments: hobbs added on 2005-12-05 15:32:37:

File Added - 158678: tcl-fts-HEAD.diff

hobbs added on 2005-12-05 15:32:36:
Logged In: YES 
user_id=72656

Attaching a variant patch by Daniel Steffen that was added
for 8.5a4 and 8.4.12+ that uses the FTS apis.

hobbs added on 2005-10-08 05:27:48:

File Added - 151716: 1034337-hobbs.patch

hobbs added on 2005-10-08 05:27:42:
Logged In: YES 
user_id=72656

Attaching patch I used (to 8.4) for posterity.

hobbs added on 2005-10-08 05:25:07:
Logged In: YES 
user_id=72656

I have reimplemented this solution to revert back the
original 8.4.7 code, and adding in a threshold limit before
a rewind, as inspired by this message:

http://lists.gnu.org/archive/html/bug-coreutils/2005-05/msg00113.html

I have chosen a threshold of 150, as I found OS X Panther to
fail at 172 (unlike the 254 the post above claims).  OS X
Tiger did not have the issue at all (at least to 400 files).

We may still be affected if you have a directory with > 150
(threshold) special NFS files, in which case we'd need to
keep a hash table of what's been deleted.  In any case, I
believe the new solution is incrementally better, avoiding
the NFS issue while still rewinding in support of some funky
readdir implementations.

Applied to 8.4.12 and 8.5a4.

hobbs added on 2005-10-04 01:23:32:
Logged In: YES 
user_id=72656

This is from a discussion with DAS on the problem (me >
quoted, DAS bare):

> So I'm curious whether this should be considered "correct"
across 
> platforms or not.  It certainly seems undesirable that a
file delete 
> -force can effectively hang due to NFS file recreation.

I basically just implemented in TraverseUnixTree() the
change described  
by the apple note, i.e. the need to do a rewind after any
readdir loop  
involving deletion of files in the directory being iterated.
As the  
note explains, it is unspecified if readdir() iterates over
all files  
in that case, and for HFS file systems this is indeed not
always the  
case (c.f. test fCmd-20.2). Note that this issue may occur
on systems  
other than Darwin if they have an HFS filesystem mounted via
NFS as  
mentioned in the note.

> In NFS world, if a file open and if gets deleted by some
other 
> process, the file will be renamed as .nfsXXX(and will 
have same contents of
> this
> deleted file).  Any dir which has these type of hidden
files cannot be
> deleted.

clearly this behaviour will cause an infinite loop problem
with the  
rewind code as implemented as any non-empty dir cannot ever
be emptied  
even with all the calls to TclpDeleteFile() during the
readdir lopp  
returning no errors... (of course, IMHO returning no error
when  
deleting such a special .nfsXXX file should probably not be
considered  
correct behaviour...)

The simplest solution would be to special case those .nfsXXX
names  at  
the top of TraverseUnixTree(), where we already check for .
and ..  
names. Of course if similar issues exist with other
filesystems that  
use other magic names, this is not a good general solution...

Otherwise, it would be possible to reimplement recursive
deletion by  
first enumerating all directory entries and only then
deleting them  
all, this would avoid the unspecified readdir behaviour(),
but could  
present memory issues for large directory trees

Darwin and other BSDs also have the more efficient fts_
family of  
functions for directory traversal (c.f. man 3 fts on OSX)
which do not  
have this issue AFAICT, so TraverseUnixTree() could be
reimplemented  
using this such platforms, but that will not solve the issue
generally  
either...

nobody added on 2005-09-30 01:04:52:
Logged In: NO 

Issue:
Side effect of this bug fix:
Removing a file/dir using command
"file delete -force" command hangs in Tcl 8.4.9, if the
directory in question contains a temporary NFS file .nfsXXXX. 

unix> lr
total 2
drwxr-xr-x  18 arvin    eng7        1024 Sep 23 11:50 ..
-rw-r--r--   1 arvin    eng7           0 Sep 23 11:50 .nfsB93A
drwxr-xr-x   2 arvin    eng7          96 Sep 23 11:51 .


A NFS temp file is created and conents have been copied to
this file.

unix> tclsh
% info patchlevel
8.4.6
% file delete -force tail-f_issue
error deleting "tail-f_issue": file already exists
% exit
unix> tclsh
% info patchlevel
8.4.9
% file delete -force tail-f_issue
<hangs>.......

This file cannot be deleted. If deleted, it gets re-created
by NFS with a different number.

Trussing the tcl process revealed that Tcl spins it's cycles
in futile attempt to delete this file.

truss -fae -p 5589
<snip>
5589:   lstat64("tail-f_issue/.nfs95701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfs95701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsA5701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfsA5701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsB5701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfsB5701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsC5701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfsC5701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsD5701", 0xFFBEDD30)   = 0

<snip>

nobody added on 2005-09-30 01:02:04:
Logged In: NO 

Issue:
Side effect of this bug fix:
Removing a file/dir using command
"file delete -force" command hangs in Tcl 8.4.9, if the
directory in question contains a temporary NFS file .nfsXXXX. 

unix> lr
total 2
drwxr-xr-x  18 arvin    eng7        1024 Sep 23 11:50 ..
-rw-r--r--   1 arvin    eng7           0 Sep 23 11:50 .nfsB93A
drwxr-xr-x   2 arvin    eng7          96 Sep 23 11:51 .


A NFS temp file is created and conents have been copied to
this file.

unix> tclsh
% info patchlevel
8.4.6
% file delete -force tail-f_issue
error deleting "tail-f_issue": file already exists
% exit
unix> tclsh
% info patchlevel
8.4.9
% file delete -force tail-f_issue
<hangs>.......

This file cannot be deleted. If deleted, it gets re-created
by NFS with a different number.

Trussing the tcl process revealed that Tcl spins it's cycles
in futile attempt to delete this file.

truss -fae -p 5589
<snip>
5589:   lstat64("tail-f_issue/.nfs95701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfs95701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsA5701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfsA5701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsB5701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfsB5701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsC5701", 0xFFBEDD30)   = 0
5589:   unlink("tail-f_issue/.nfsC5701")                = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 0
5589:   llseek(3, 0, SEEK_CUR)                          = 96
5589:   llseek(3, 0, SEEK_SET)                          = 0
5589:   getdents64(3, 0x0004F038, 1048)                 = 80
5589:   lstat64("tail-f_issue/.nfsD5701", 0xFFBEDD30)   = 0

<snip>

das added on 2004-11-11 08:27:53:

File Added - 108377: 1034337-core-8-4-branch.diff

das added on 2004-11-11 08:26:54:

File Added - 108376: 1034337-HEAD.diff

Logged In: YES 
user_id=90580

fix committed to HEAD and core-8-4-branch and attached. also added a 
testcase for this bug.

vincentdarley added on 2004-09-27 19:27:27:
Logged In: YES 
user_id=32170

Daniel is probably best placed to suggest the appropriate
fix (or take that from the apple.com webpage cited).

Attachments: