Ticket UUID: | 971976 | |||
Title: | Inconsistent naming in [file exists] | |||
Type: | Bug | Version: | obsolete: 8.5a2 | |
Submitter: | coldstore | Created on: | 2004-06-13 05:16:20 | |
Subsystem: | 37. File System | Assigned To: | vincentdarley | |
Priority: | 5 Medium | Severity: | ||
Status: | Open | Last Modified: | 2004-10-07 22:11:24 | |
Resolution: | Remind | Closed By: | vincentdarley | |
Closed on: | 2004-06-30 14:44:43 | |||
Description: |
Statement of problem: [file exists ${file}/] succeeds when [file type $file] eq "file" This seems inconsistent and unnecessary, and probably stems from [file split a/b/] eq {a b} and not {a b {}} Demonstration: # A directory: % file type wikit.vfs directory % file exists wikit.vfs 1 % file exists wikit.vfs/ 1 # A file: % file type wikit file % file exists wikit 1 % file exists wikit/ 1 | |||
User Comments: |
vincentdarley added on 2004-10-07 22:11:24:
Logged In: YES user_id=32170 This should now be easier to address, given better splitting of platform code into platform directories. We need to ensure that normalization preserves a single trailing '/' (on Win & Unix), and then we need to workaround the win32 apis which will tell us 'foo/' exists even if foo is not a directory (so that Win & Unix are consistent). This means changes to the platform normalization code (on Win & Unix), and win32 workarounds to TclpObjAccess and friends). dgp added on 2004-07-17 23:04:06: Logged In: YES user_id=80530 I'm probably missing something. Tcl makes use of the normalized path to help it decide which Tcl_Filesystem the path belongs to, I agree. By the time NativeCreateNativePath is called, it's already been determined that the path belongs to the "native" Tcl_Filesystem. At that point, there's no need to use a normalized path any more, the Tcl_Filesystem has been chosen. All that remains to do is produce the correct "nativePath" value in the intrep . I claim it is incorrect for the procedure that creates that value to be dropping trailing slashes. (At least for the Unix "native" filesystem, whose underlying system calls do want to distniguish between the path "file" and the path "file/") I still think NativeCreateNativePath needs re-examination, and perhaps more commenting. (While we're there, it should also be split into two TclpCreateNativePath routines, one each in the "win" and "unix" areas, so that we can get the #ifdef WIN32 out of a generic file. Then we can more easily manage any other platform differences in the "native" paths) vincentdarley added on 2004-07-17 18:49:00: Logged In: YES user_id=32170 dgp - I think you're missing something there. Of course the string rep will still have '/'. The issue is that pretty much by definition Tcl operates on normalized paths (reqd by vfs), so it's not that we're converting the string rep to native, it's that we're converting string rep to normalized to native. dgp added on 2004-07-15 23:26:55: Logged In: YES user_id=80530 Look back at the original report, it appears the root cause is a flaw in the NativeCreateNativeRep() routine. The access() system call that actually does the work is found in the TclpObjAccess() routine in, say, tclUnixFile.c. At that point the string rep of the pathPtr value still contains the trailing '/'. However, the routine Tcl_FSGetNativePath() is called to get the string to pass to access(). One thing that routine is supposed to do is the system encoding conversion. Something else it *is* doing, and this bug report claims it should not do, is dropping that trailing '/'. The logic of NativeCreateNativeRep needs examination. vincentdarley added on 2004-07-09 15:58:10: Logged In: YES user_id=32170 Yes another difference between Windows and Unix! Anyway, it seems we have a coherent proposal, (which might require an additional check on Windows to ensure it's consistent with Unix, given that the OS is quite happy to say "fileName.txt/" exists when it shouldn't). Now we just need an implementation, tests, docs. dkf added on 2004-07-08 16:10:54: Logged In: YES user_id=79902 This is *different* from the underlying API on Unix where foo/ does not exist unless foo is a directory or a symlink to a directory. FWIW, here's the C program (I'm not using Tcl to test this because I do not trust Tcl's behaviour here entirely at the moment) I used to test this: #include <unistd.h> #include <stdio.h> int main(int argc, char **argv) { if (access(argv[1], F_OK) == 0) { printf("%s exists\n", argv[1]); } else { fprintf(stderr, "%s ", argv[1]); perror("non-existant"); } } vincentdarley added on 2004-07-08 15:50:23: Logged In: YES user_id=32170 Just to clarify, it appears as if the underlying OS API (on Windows, ActiveTcl 8.3, at least) is quite happy to say that 'file exists tcl83.dll/' and 'file exists tcl83.dll/.' are true (as I demonstrated below, and Tcl 8.3 has no normalization), so while I agree this might be surprising to the user, your arguments about how the underlying OS API works are not quite correct. Of course we can always just say this is a MSFT bug that we need to work around. But perhaps there are windows script programmers who do expect this behaviour? Anyway, it appears as if the current proposal is that 'file norm' should preserve a trailing '/' (internally as well as externally), and that no other changes are needed beyond tests and docs. dkf added on 2004-07-07 15:40:52: Logged In: YES user_id=79902 I said that (can't be bothered to quote) because it is apparently true (i.e. it is the only way to get the behaviour that general script programmers would expect.) I am *categorically* *not* (now :)) proposing any alteration to the behaviour of [file norm]. They *expect* [file exists foo/] to fail if foo is not a directory or a link to a directory as the underlying OS API works exactly like that. They don't care about normalization; they don't believe they explicitly asked for it. What is wrong (in their world-view and I find it very hard to argue with them on this) is that [file exists] normalizes its argument first. vincentdarley added on 2004-07-07 00:25:28: Logged In: YES user_id=32170 The file normalize documentation says "Returns a unique normalised path representation for the file-system object (file, directory, link, etc), whose string value can be used as a unique identifier for it. ", so I think this says pretty clearly that $file and [file norm $file] must refer to the same thing, so it looks like we both agree with Tcl on that. >It's just that people very often >don't want to feed normalized paths >to the underlying OS because they are really seeking to gain >information about how the OS interprets that denormalized form. Hold on! That's a very strong statement. Why do you say that? When I use Tcl I want to operate on files, directories or whatever, and have no interest at all in gaining any information about how the OS interprets anything. If you really believe that's what 'people very often want', then Tcl 8.4 took a huge swerve in the wrong direction for those people - pretty much by _definition_ Tcl operates on normalized paths now. Anyway, assuming I've got the wrong end of the stick on this last bit, it suggests we've homed in on the proposal that 'file norm' should preserve a single trailing '/', and therefore that 'file exists fileName/' will return 0, but that we have the possibility that there are two distinct names which might refer to the same filesystem entity, if that entity is a directory: [file norm dirName] != [file norm dirName/] and yet both refer to the same thing, so we'll need to document this interesting quirk. I don't know whether such a change requires a TIP or not. It certainly requires a bunch of new tests and docs. dkf added on 2004-07-06 21:33:37: Logged In: YES user_id=79902 I agree that $file and [file normalize $file] should generally refer to the same thing (in some suitable sense, of course, but I think we're in agreement there). It's just that people very often don't want to feed normalized paths to the underlying OS because they are really seeking to gain information about how the OS interprets that denormalized form. vincentdarley added on 2004-07-06 21:27:53: Logged In: YES user_id=32170 Just one more datapoint: (ActiveTcl8.3) 5 % file exists bin 1 (ActiveTcl8.3) 6 % file exists bin/ 1 (ActiveTcl8.3) 7 % file exists bin/. 1 (ActiveTcl8.3) 8 % cd bin (bin) 9 % file exists tcl83.dll 1 (bin) 10 % file exists tcl83.dll/ 1 (bin) 11 % file exists tcl83.dll/. 1 (bin) 12 % So, this behaviour (including the "odder bug" below) has been there since Tcl 8.3 (i.e. pre 'file normalize'), at least on Windows. This probably means it has been there since even earlier on Windows (in possible contradiction to the behaviour on Solaris 9/tclsh 7.4p3). I'm completely in two minds about which of the two anomalies below is preferable. I agree that: file exists someFile/. == 1 is just wrong. The easiest way to ensure the above result changes is to have 'file normalize someFile/.' be 'someFile/', but that then has the anomaly I first raised. On balance I tend to prefer this anomaly than the one where 'file norm $path' refers to a different filesystem object to $path -- that just seems to contradict the documentation of 'file normalize'. dkf added on 2004-07-06 19:44:32: Logged In: YES user_id=79902 We're going to end up with an anomaly whatever we do, but that can't be helped. Change it and it will "do what the programmer wants" virtually all the time. FWIW, tclsh 7.4p3 on Solaris 9 has this behaviour: % file exist someFile 1 % file type someFile file % file exist someFile/ 1 % file exist someFile/. 0 vincentdarley added on 2004-07-06 18:10:22: Logged In: YES user_id=32170 Dkf's latest suggestion does make sense (and resolves the issue I had raised), but it now has the following peculiarity: file exists $path file exists [file normalize $path] may not be equal (when $path is 'foo/' and 'foo' is only a file). dkf added on 2004-07-06 15:25:56: Logged In: YES user_id=79902 Hmm. Perhaps the right thing to do (for [file exists] and its friends) is to detect *before* normalization whether the filename has a separator at its end, and if so, add that back onto the filename immediately before firing it into the code that actually determines if the file exists (an alternative would be to add '/.' of course). The point is that the normalized filename is indeed free of separators at the end, but the presence of the separator at the end is important for the operation being invoked, as modelling the behaviour of [file exists] after the shell command 'test -e' would be strongly preferred by many users. Does that make sense? In fact I see an even odder bug: % file exist ChangeLog/. 1 % info patch 8.4.6 % file type ChangeLog file I find it very hard to believe that that response can be regarded as correct; I suspect that normalization is not being our friend here... :^( coldstore added on 2004-07-06 10:55:25: Logged In: YES user_id=19214 I believe the situation is that / is a path separator, not a path terminator, so if file is a directory, file is its name as a file, and file/ is a synonym, as are file/., file/../file etc. file is the shortest name. vincentdarley added on 2004-07-06 02:07:44: Logged In: YES user_id=32170 Thanks for all the clarifications. The only thing that really concerns me now is this: In Tcl 8.4/8.5 at present 'file normalize $path' is a unique identifier for that filesystem entity. If 'file normalize $path1 != file normalize $path2', then the two filesystem entities are different. With the behaviour explained here, this will no longer be true: file normalize foo//// == foo/ file normalize foo == foo but these actually refer to the same filesystem entity, assuming there is a directory called 'foo'. At the very least this goes against the spirit of 'file normalize', and this is what makes me uneasy. NB: There's a contradiction between the two answers given for (iii). I'm assuming the trailing '/' is the correct answer? dkf added on 2004-07-05 20:32:28: Logged In: YES user_id=79902 (i) No '/' added if none was present at end of input path (ii) Cannot test; no VFS installed on dev machine (iii) Reduce to 'wikit.vfs/' (iv) This is a good thing IMHO (v) Atoms of a directory name may not contain '/' so question moot, but both 'file isdir foo' and 'file isdir foo/' return identical answers (which depend on whether the directory exists or not.) (vi) Returns 'foo' coldstore added on 2004-07-05 20:26:46: Logged In: YES user_id=19214 A hopefully clarifying statement: '/' is a path element separator, not a path element terminator. It follows, then, that a path x/ is a path which consists of x followed by nothing. If x is a directory, then x/ exists. If x is a file, then x/ can't exist, nor can x/. or any other similar path. (i) What does 'file normalize wikit.vfs' do when the file doesn't exist but the directory does? Is a '/' added? Remember that the definition of a normalized path is the _unique_ representation. I don't think it would add a trailing '/', the unique representation is presumably .../wikit.vfs. wikit.vfs/ is an alias for the file's name iff it is a directory. (ii) What does it do when (with a vfs) in some sense both exist? That's an interesting question. I suggest that the difference between a vfs directory which is also a file is purely that it can be [open]ed and read, written, etc. Does the 'file/dir' fred/ exist? Yes, if there is a directory, because fred/ fred// fred/../fred fred/. are all synonyms, I think. (iii) What does 'file normalize wiki.vfs//////' do with your patch? Seems to reduce it to wiki.vfs, which seems right. (iv) The patch needs to add lots of tests for the new behaviour: glob -dir foo/ * glob -path foo/ * file isdir foo file isdir foo/ etc. (v) What does 'file isdir foo' return, when the directory 'foo/' exists? The directory 'foo/' or the directory 'foo\/' ? (vi) What does 'file dirname foo/bar' return? Just 'foo' or 'foo/'? It should return 'foo', as it does now, IMO. vincentdarley added on 2004-07-05 20:06:24: Logged In: YES user_id=32170 Here are some questions that need answering: (i) What does 'file normalize wikit.vfs' do when the file doesn't exist but the directory does? Is a '/' added? Remember that the definition of a normalized path is the _unique_ representation. (ii) What does it do when (with a vfs) in some sense both exist? (iii) What does 'file normalize wiki.vfs//////' do with your patch? (iv) The patch needs to add lots of tests for the new behaviour: glob -dir foo/ * glob -path foo/ * file isdir foo file isdir foo/ etc. (v) What does 'file isdir foo' return, when the directory 'foo/' exists? (vi) What does 'file dirname foo/bar' return? Just 'foo' or 'foo/'? ...that's it for now. (Note: (i) above is the most important, but I think that it has further implications). dkf added on 2004-07-05 01:22:58: File Added - 92845: tfn.diff Logged In: YES user_id=79902 Thanks Colin for spotting that! The attached patch (against the HEAD and really just a minor variation on what Colin suggests) causes two new failures in filename.test (and no others), which appear to be (in part) specifying the old behaviour. But that means we'll end up detecting the change. :^) We'll also want to add some tests that check the behaviour detailed in the bug report, i.e. that [file exists $someRealFile/] fails. Reopening for maintainer assessment. coldstore added on 2004-07-04 11:31:30: Logged In: YES user_id=19214 I *know* it shouldn't be this simple, but: generic/tclFileName.c line 768 reads: if (p[1] != '\0') { if (needsSep) { *dest++ = '/'; } } Should read: if (needsSep) { *dest++ = '/'; } fixes the problem under unix and seems to pass cmdAH.test Could it be as simple as the fact that we explicitly decided to remove trailing slashes? What other regression/other tests need to pass? vincentdarley added on 2004-07-03 23:39:20: Logged In: YES user_id=32170 Fair enough -- it might well be a good idea to change this. Having said that, from my knowledge of the filesystem code, it's not a five-minute task. Assuming the low-level OS-routines behave nicely wrt to a trailing slash (and treat such a thing as a directory as the bug reporter would like), then it shouldn't be that difficult. If the OS-routines don't behave in that way, then this is a big task. I'm more than happy to test any patches to accomplish this functionality. dkf added on 2004-07-01 22:26:11: Logged In: YES user_id=79902 Hmm. 8.0 has the same behaviour as 8.4, but this differs from external existence testing schemes (e.g. 'test -e file' in Unix shells). Even if we've always done things this way, it's probably not a good idea to keep on doing them this way because it is minimally "surprising"... :^/ dgp added on 2004-07-01 21:08:53: Logged In: YES user_id=80530 How far back in history does this behavior go? What did Tcl 7 do, for example? Might be nice to call this a bug and fix it. vincentdarley added on 2004-06-30 21:44:42: Logged In: YES user_id=32170 Clarified this behaviour in the filename.n man page. Any changes to this behaviour would require a TIP, given the potential compatibility issues. vincentdarley added on 2004-06-28 16:14:40: Logged In: YES user_id=32170 I assume this has the same behaviour in Tcl 8.3? The only thing we can really do here (ensuring backwards compatibility) is to clarify these details in the 'filename' man page. Tcl will ignore any number of trailing '/'s. coldstore added on 2004-06-13 12:30:25: Logged In: YES user_id=19214 A further note: this probably wouldn't bother anyone in normal usage but in virtual filesystem work there can be a world of difference between a file "a", a directory "a" and a file "a/". |
Attachments:
- tfn.diff [download] added by dkf on 2004-07-05 01:22:58. [details]