Ticket UUID: | 1034337 | |||
Title: | recursive file/dir delete broken on OSX | |||
Type: | Bug | Version: | obsolete: 8.4.11 | |
Submitter: | nobody | Created on: | 2004-09-24 21:18:35 | |
Subsystem: | 37. File System | Assigned To: | hobbs | |
Priority: | 8 | Severity: | ||
Status: | Closed | Last Modified: | 2005-12-05 15:32:37 | |
Resolution: | Fixed | Closed By: | hobbs | |
Closed on: | 2005-10-07 22:25:07 | |||
Description: |
Submitted by: Sue LoVerso, [email protected] Recursive, forced delete of a directory fails on OSX. TraverseUnixTree should be fixed in tclUnixFCmd.c (or a macosx-specific version written). Apple has this to say: http://docs.info.apple.com/article.html?artnum=107884 There may be other places using that same construct for readdir, I didn't look for them. TraverseUnixTree is the one affecting me. This simple script illustrates the problem: proc x { } { set nfiles 250 puts "Creating $nfiles files" file delete -force ./TESTDIR file mkdir ./TESTDIR for { set i 0 } { $i < $nfiles } { incr i } { file copy ./x.tcl ./TESTDIR/x.tcl.$i } puts "Deleting $nfiles in TESTDIR" set stat [catch {file delete -force ./TESTDIR} ret] if { $stat != 0 } { puts "Error: stat $stat, ret $ret" set filelist [exec ls ./TESTDIR] puts "FILES: $filelist" } else { puts "Delete successful" } } Multiple calls to 'file delete -force ./TESTDIR' will eventually work. | |||
User Comments: |
hobbs added on 2005-12-05 15:32:37:
File Added - 158678: tcl-fts-HEAD.diff hobbs added on 2005-12-05 15:32:36: Logged In: YES user_id=72656 Attaching a variant patch by Daniel Steffen that was added for 8.5a4 and 8.4.12+ that uses the FTS apis. hobbs added on 2005-10-08 05:27:48: File Added - 151716: 1034337-hobbs.patch hobbs added on 2005-10-08 05:27:42: Logged In: YES user_id=72656 Attaching patch I used (to 8.4) for posterity. hobbs added on 2005-10-08 05:25:07: Logged In: YES user_id=72656 I have reimplemented this solution to revert back the original 8.4.7 code, and adding in a threshold limit before a rewind, as inspired by this message: http://lists.gnu.org/archive/html/bug-coreutils/2005-05/msg00113.html I have chosen a threshold of 150, as I found OS X Panther to fail at 172 (unlike the 254 the post above claims). OS X Tiger did not have the issue at all (at least to 400 files). We may still be affected if you have a directory with > 150 (threshold) special NFS files, in which case we'd need to keep a hash table of what's been deleted. In any case, I believe the new solution is incrementally better, avoiding the NFS issue while still rewinding in support of some funky readdir implementations. Applied to 8.4.12 and 8.5a4. hobbs added on 2005-10-04 01:23:32: Logged In: YES user_id=72656 This is from a discussion with DAS on the problem (me > quoted, DAS bare): > So I'm curious whether this should be considered "correct" across > platforms or not. It certainly seems undesirable that a file delete > -force can effectively hang due to NFS file recreation. I basically just implemented in TraverseUnixTree() the change described by the apple note, i.e. the need to do a rewind after any readdir loop involving deletion of files in the directory being iterated. As the note explains, it is unspecified if readdir() iterates over all files in that case, and for HFS file systems this is indeed not always the case (c.f. test fCmd-20.2). Note that this issue may occur on systems other than Darwin if they have an HFS filesystem mounted via NFS as mentioned in the note. > In NFS world, if a file open and if gets deleted by some other > process, the file will be renamed as .nfsXXX(and will have same contents of > this > deleted file). Any dir which has these type of hidden files cannot be > deleted. clearly this behaviour will cause an infinite loop problem with the rewind code as implemented as any non-empty dir cannot ever be emptied even with all the calls to TclpDeleteFile() during the readdir lopp returning no errors... (of course, IMHO returning no error when deleting such a special .nfsXXX file should probably not be considered correct behaviour...) The simplest solution would be to special case those .nfsXXX names at the top of TraverseUnixTree(), where we already check for . and .. names. Of course if similar issues exist with other filesystems that use other magic names, this is not a good general solution... Otherwise, it would be possible to reimplement recursive deletion by first enumerating all directory entries and only then deleting them all, this would avoid the unspecified readdir behaviour(), but could present memory issues for large directory trees Darwin and other BSDs also have the more efficient fts_ family of functions for directory traversal (c.f. man 3 fts on OSX) which do not have this issue AFAICT, so TraverseUnixTree() could be reimplemented using this such platforms, but that will not solve the issue generally either... nobody added on 2005-09-30 01:04:52: Logged In: NO Issue: Side effect of this bug fix: Removing a file/dir using command "file delete -force" command hangs in Tcl 8.4.9, if the directory in question contains a temporary NFS file .nfsXXXX. unix> lr total 2 drwxr-xr-x 18 arvin eng7 1024 Sep 23 11:50 .. -rw-r--r-- 1 arvin eng7 0 Sep 23 11:50 .nfsB93A drwxr-xr-x 2 arvin eng7 96 Sep 23 11:51 . A NFS temp file is created and conents have been copied to this file. unix> tclsh % info patchlevel 8.4.6 % file delete -force tail-f_issue error deleting "tail-f_issue": file already exists % exit unix> tclsh % info patchlevel 8.4.9 % file delete -force tail-f_issue <hangs>....... This file cannot be deleted. If deleted, it gets re-created by NFS with a different number. Trussing the tcl process revealed that Tcl spins it's cycles in futile attempt to delete this file. truss -fae -p 5589 <snip> 5589: lstat64("tail-f_issue/.nfs95701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfs95701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsA5701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfsA5701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsB5701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfsB5701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsC5701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfsC5701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsD5701", 0xFFBEDD30) = 0 <snip> nobody added on 2005-09-30 01:02:04: Logged In: NO Issue: Side effect of this bug fix: Removing a file/dir using command "file delete -force" command hangs in Tcl 8.4.9, if the directory in question contains a temporary NFS file .nfsXXXX. unix> lr total 2 drwxr-xr-x 18 arvin eng7 1024 Sep 23 11:50 .. -rw-r--r-- 1 arvin eng7 0 Sep 23 11:50 .nfsB93A drwxr-xr-x 2 arvin eng7 96 Sep 23 11:51 . A NFS temp file is created and conents have been copied to this file. unix> tclsh % info patchlevel 8.4.6 % file delete -force tail-f_issue error deleting "tail-f_issue": file already exists % exit unix> tclsh % info patchlevel 8.4.9 % file delete -force tail-f_issue <hangs>....... This file cannot be deleted. If deleted, it gets re-created by NFS with a different number. Trussing the tcl process revealed that Tcl spins it's cycles in futile attempt to delete this file. truss -fae -p 5589 <snip> 5589: lstat64("tail-f_issue/.nfs95701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfs95701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsA5701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfsA5701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsB5701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfsB5701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsC5701", 0xFFBEDD30) = 0 5589: unlink("tail-f_issue/.nfsC5701") = 0 5589: getdents64(3, 0x0004F038, 1048) = 0 5589: llseek(3, 0, SEEK_CUR) = 96 5589: llseek(3, 0, SEEK_SET) = 0 5589: getdents64(3, 0x0004F038, 1048) = 80 5589: lstat64("tail-f_issue/.nfsD5701", 0xFFBEDD30) = 0 <snip> das added on 2004-11-11 08:27:53: File Added - 108377: 1034337-core-8-4-branch.diff das added on 2004-11-11 08:26:54: File Added - 108376: 1034337-HEAD.diff Logged In: YES user_id=90580 fix committed to HEAD and core-8-4-branch and attached. also added a testcase for this bug. vincentdarley added on 2004-09-27 19:27:27: Logged In: YES user_id=32170 Daniel is probably best placed to suggest the appropriate fix (or take that from the apple.com webpage cited). |
Attachments:
- tcl-fts-HEAD.diff [download] added by hobbs on 2005-12-05 15:32:37. [details]
- 1034337-hobbs.patch [download] added by hobbs on 2005-10-08 05:27:42. [details]
- 1034337-core-8-4-branch.diff [download] added by das on 2004-11-11 08:27:53. [details]
- 1034337-HEAD.diff [download] added by das on 2004-11-11 08:26:54. [details]