Tcl Source Code

View Ticket
Login
Ticket UUID: 971976
Title: Inconsistent naming in [file exists]
Type: Bug Version: obsolete: 8.5a2
Submitter: coldstore Created on: 2004-06-13 05:16:20
Subsystem: 37. File System Assigned To: vincentdarley
Priority: 5 Medium Severity:
Status: Open Last Modified: 2004-10-07 22:11:24
Resolution: Remind Closed By: vincentdarley
    Closed on: 2004-06-30 14:44:43
Description:
Statement of problem:

[file exists ${file}/] succeeds when [file type $file]
eq "file"

This seems inconsistent and unnecessary, and probably
stems from [file split a/b/] eq {a b} and not {a b {}}

Demonstration:

# A directory:
% file type wikit.vfs
directory
% file exists wikit.vfs
1
% file exists wikit.vfs/
1

# A file:
% file type wikit
file
% file exists wikit
1
% file exists wikit/
1
User Comments: vincentdarley added on 2004-10-07 22:11:24:
Logged In: YES 
user_id=32170

This should now be easier to address, given better splitting
of platform code into platform directories.  We need to
ensure that normalization preserves a single trailing '/'
(on Win & Unix), and then we need to workaround the win32
apis which will tell us 'foo/' exists even if foo is not a
directory (so that Win & Unix are consistent).

This means changes to the platform normalization code (on
Win & Unix), and win32 workarounds to TclpObjAccess and
friends).

dgp added on 2004-07-17 23:04:06:
Logged In: YES 
user_id=80530

I'm probably missing something.

Tcl makes use of the normalized path
to help it decide which Tcl_Filesystem
the path belongs to, I agree.

By the time NativeCreateNativePath
is called, it's already been determined
that the path belongs to the "native"
Tcl_Filesystem.

At that point, there's no need to use
a normalized path any more, the
Tcl_Filesystem has been chosen.  All
that remains to do is produce the correct
"nativePath" value in the intrep .  I claim
it is incorrect for the procedure that creates
that value to be dropping trailing slashes.
(At least for the Unix "native" filesystem,
whose underlying system calls do want to
distniguish between the path "file" and the
path "file/")

I still think NativeCreateNativePath needs
re-examination, and perhaps more commenting.
(While we're there, it should also be split into
two TclpCreateNativePath routines, one each
in the "win" and "unix" areas, so that we can
get the #ifdef WIN32 out of a generic file.
Then we can more easily manage any other
platform differences in the "native" paths)

vincentdarley added on 2004-07-17 18:49:00:
Logged In: YES 
user_id=32170

dgp - I think you're missing something there.  Of course the
string rep will still have '/'.  The issue is that pretty
much by definition Tcl operates on normalized paths (reqd by
vfs), so it's not that we're converting the string rep to
native, it's that we're converting string rep to normalized
to native.

dgp added on 2004-07-15 23:26:55:
Logged In: YES 
user_id=80530


Look back at the original report,
it appears the root cause is a
flaw in the NativeCreateNativeRep()
routine.

The access() system call that actually
does the work is found in the TclpObjAccess()
routine in, say, tclUnixFile.c.  At that point
the string rep of the pathPtr value still
contains the trailing '/'.  However, the
routine Tcl_FSGetNativePath() is called
to get the string to pass to access().  One
thing that routine is supposed to do is
the system encoding conversion.  Something
else it *is* doing, and this bug report claims
it should not do, is dropping that trailing '/'.

The logic of NativeCreateNativeRep needs
examination.

vincentdarley added on 2004-07-09 15:58:10:
Logged In: YES 
user_id=32170

Yes another difference between Windows and Unix!  Anyway, it
seems we have a coherent proposal, (which might require an
additional check on Windows to ensure it's consistent with
Unix, given that the OS is quite happy to say
"fileName.txt/" exists when it shouldn't).

Now we just need an implementation, tests, docs.

dkf added on 2004-07-08 16:10:54:
Logged In: YES 
user_id=79902

This is *different* from the underlying API on Unix where
foo/ does not exist unless foo is a directory or a symlink
to a directory.

FWIW, here's the C program (I'm not using Tcl to test this
because I do not trust Tcl's behaviour here entirely at the
moment) I used to test this:
  #include <unistd.h>
  #include <stdio.h>
  int main(int argc, char **argv) {
      if (access(argv[1], F_OK) == 0) {
          printf("%s exists\n", argv[1]);
      } else {
          fprintf(stderr, "%s ", argv[1]);
          perror("non-existant");
      }
  }

vincentdarley added on 2004-07-08 15:50:23:
Logged In: YES 
user_id=32170

Just to clarify, it appears as if the underlying OS API (on
Windows, ActiveTcl 8.3, at least) is quite happy to say that
'file exists tcl83.dll/' and 'file exists tcl83.dll/.' are
true (as I demonstrated below, and Tcl 8.3 has no
normalization), so while I agree this might be surprising to
the user, your arguments about how the underlying OS API
works are not quite correct.  Of course we can always just
say this is a MSFT bug that we need to work around.  But
perhaps there are windows script programmers who do expect
this behaviour?

Anyway, it appears as if the current proposal is that 'file
norm' should preserve a trailing '/' (internally as well as
externally), and that no other changes are needed beyond
tests and docs.

dkf added on 2004-07-07 15:40:52:
Logged In: YES 
user_id=79902

I said that (can't be bothered to quote) because it is
apparently true (i.e. it is the only way to get the
behaviour that general script programmers would expect.)  I
am *categorically* *not* (now :)) proposing any alteration
to the behaviour of [file norm].

They *expect* [file exists foo/] to fail if foo is not a
directory or a link to a directory as the underlying OS API
works exactly like that.  They don't care about
normalization; they don't believe they explicitly asked for it.

What is wrong (in their world-view and I find it very hard
to argue with them on this) is that [file exists] normalizes
its argument first.

vincentdarley added on 2004-07-07 00:25:28:
Logged In: YES 
user_id=32170

The file normalize documentation says "Returns a unique
normalised path representation for the file-system
object (file, directory, link, etc), whose string value can
be used as a unique identifier for it. ", so I think this
says pretty clearly that $file and [file norm $file] must
refer to the same thing, so it looks like we both agree with
Tcl on that.

>It's just that people very often 
>don't want to feed normalized paths
>to the underlying OS because they are really seeking to gain
>information about how the OS interprets that denormalized form.

Hold on! That's a very strong statement.  Why do you say that?

When I use Tcl I want to operate on files, directories or
whatever, and have no interest at all in gaining any
information about how the OS interprets anything.

If you really believe that's what 'people very often want',
then Tcl 8.4 took a huge swerve in the wrong direction for
those people - pretty much by _definition_ Tcl operates on
normalized paths now.

Anyway, assuming I've got the wrong end of the stick on this
last bit, it suggests we've homed in on the proposal that
'file norm' should preserve a single trailing '/', and
therefore that 'file exists fileName/' will return 0, but
that we have the possibility that there are two distinct
names which might refer to the same filesystem entity, if
that entity is a directory:

[file norm dirName] != [file norm dirName/]

and yet both refer to the same thing, so we'll need to
document this interesting quirk.

I don't know whether such a change requires a TIP or not. It
certainly requires a bunch of new tests and docs.

dkf added on 2004-07-06 21:33:37:
Logged In: YES 
user_id=79902

I agree that $file and [file normalize $file] should
generally refer to the same thing (in some suitable sense,
of course, but I think we're in agreement there).  It's just
that people very often don't want to feed normalized paths
to the underlying OS because they are really seeking to gain
information about how the OS interprets that denormalized form.

vincentdarley added on 2004-07-06 21:27:53:
Logged In: YES 
user_id=32170

Just one more datapoint:

(ActiveTcl8.3) 5 % file exists bin
1
(ActiveTcl8.3) 6 % file exists bin/
1
(ActiveTcl8.3) 7 % file exists bin/.
1
(ActiveTcl8.3) 8 % cd bin
(bin) 9 % file exists tcl83.dll
1
(bin) 10 % file exists tcl83.dll/
1
(bin) 11 % file exists tcl83.dll/.
1
(bin) 12 % 

So, this behaviour (including the "odder bug" below) has
been there since Tcl 8.3 (i.e. pre 'file normalize'), at
least on Windows. This probably means it has been there
since even earlier on Windows (in possible contradiction to
the behaviour on Solaris 9/tclsh 7.4p3).

I'm completely in two minds about which of the two anomalies
below is preferable. I agree that:

file exists someFile/. == 1

is just wrong. The easiest way to ensure the above result
changes is to have 'file normalize someFile/.' be
'someFile/', but that then has the anomaly I first raised. 
On balance I tend to prefer this anomaly than the one where
'file norm $path' refers to a different filesystem object to
$path -- that just seems to contradict the documentation of
'file normalize'.

dkf added on 2004-07-06 19:44:32:
Logged In: YES 
user_id=79902

We're going to end up with an anomaly whatever we do, but
that can't be helped.  Change it and it will "do what the
programmer wants" virtually all the time.

FWIW, tclsh 7.4p3 on Solaris 9 has this behaviour:
  % file exist someFile
  1
  % file type someFile
  file
  % file exist someFile/
  1
  % file exist someFile/.
  0

vincentdarley added on 2004-07-06 18:10:22:
Logged In: YES 
user_id=32170

Dkf's latest suggestion does make sense (and resolves the
issue I had raised), but it now has the following peculiarity:

file exists $path 
file exists [file normalize $path]

may not be equal (when $path is 'foo/' and 'foo' is only a
file).

dkf added on 2004-07-06 15:25:56:
Logged In: YES 
user_id=79902

Hmm. Perhaps the right thing to do (for [file exists] and
its friends) is to detect *before* normalization whether the
filename has a separator at its end, and if so, add that
back onto the filename immediately before firing it into the
code that actually determines if the file exists (an
alternative would be to add '/.' of course). The point is
that the normalized filename is indeed free of separators at
the end, but the presence of the separator at the end is
important for the operation being invoked, as modelling the
behaviour of [file exists] after the shell command 'test -e'
would be strongly preferred by many users.

Does that make sense?

In fact I see an even odder bug:
  % file exist ChangeLog/.
  1
  % info patch
  8.4.6
  % file type ChangeLog
  file
I find it very hard to believe that that response can be
regarded as correct; I suspect that normalization is not
being our friend here... :^(

coldstore added on 2004-07-06 10:55:25:
Logged In: YES 
user_id=19214

I believe the situation is that / is a path separator, not a
path terminator, so if file is a directory, file is its name
as a file, and file/ is a synonym, as are file/.,
file/../file etc.  file is the shortest name.

vincentdarley added on 2004-07-06 02:07:44:
Logged In: YES 
user_id=32170

Thanks for all the clarifications.  The only thing that
really concerns me now is this:

In Tcl 8.4/8.5 at present 'file normalize $path' is a unique
identifier for that filesystem entity.  If 'file normalize
$path1 != file normalize $path2', then the two filesystem
entities are different.

With the behaviour explained here, this will no longer be true:

file normalize foo//// == foo/
file normalize foo      == foo

but these actually refer to the same filesystem entity,
assuming there is a directory called 'foo'.

At the very least this goes against the spirit of 'file
normalize', and this is what makes me uneasy.

NB: There's a contradiction between the two answers given
for (iii). I'm assuming the trailing '/' is the correct answer?

dkf added on 2004-07-05 20:32:28:
Logged In: YES 
user_id=79902

(i) No '/' added if none was present at end of input path
(ii) Cannot test; no VFS installed on dev machine
(iii) Reduce to 'wikit.vfs/'
(iv) This is a good thing IMHO
(v) Atoms of a directory name may not contain '/' so
question moot, but both 'file isdir foo' and 'file isdir
foo/' return identical answers (which depend on whether the
directory exists or not.)
(vi) Returns 'foo'

coldstore added on 2004-07-05 20:26:46:
Logged In: YES 
user_id=19214

A hopefully clarifying statement: '/' is a path element
separator, not a path element terminator.

It follows, then, that a path x/ is a path which consists of
x followed by nothing.  If x is a directory, then x/ exists.
 If x is a file, then x/ can't exist, nor can x/. or any
other similar path.

(i) What does 'file normalize wikit.vfs' do when the file
doesn't exist but the directory does?  Is a '/' added?
Remember that the definition of a normalized path is the
_unique_ representation.

I don't think it would add a trailing '/', the unique
representation is presumably .../wikit.vfs.  wikit.vfs/ is
an alias for the file's name iff it is a directory.

(ii) What does it do when (with a vfs) in some sense both exist?

That's an interesting question.  I suggest that the
difference between a vfs directory which is also a file is
purely that it can be [open]ed and read, written, etc.  Does
the 'file/dir' fred/ exist?  Yes, if there is a directory,
because fred/ fred// fred/../fred fred/. are all synonyms, I
think.

(iii) What does 'file normalize wiki.vfs//////' do with your
patch?

Seems to reduce it to wiki.vfs, which seems right.

(iv) The patch needs to add lots of tests for the new behaviour:

glob -dir foo/ *
glob -path foo/ *
file isdir foo
file isdir foo/
etc.

(v) What does 'file isdir foo' return, when the directory
'foo/' exists?

The directory 'foo/' or the directory 'foo\/' ?

(vi) What does 'file dirname foo/bar' return?  Just 'foo' or
'foo/'?

It should return 'foo', as it does now, IMO.

vincentdarley added on 2004-07-05 20:06:24:
Logged In: YES 
user_id=32170

Here are some questions that need answering:

(i) What does 'file normalize wikit.vfs' do when the file
doesn't exist but the directory does?  Is a '/' added? 
Remember that the definition of a normalized path is the
_unique_ representation.

(ii) What does it do when (with a vfs) in some sense both exist?

(iii) What does 'file normalize wiki.vfs//////' do with your
patch?

(iv) The patch needs to add lots of tests for the new behaviour:

glob -dir foo/ *
glob -path foo/ *
file isdir foo
file isdir foo/
etc.

(v) What does 'file isdir foo' return, when the directory
'foo/' exists?

(vi) What does 'file dirname foo/bar' return?  Just 'foo' or
'foo/'?

...that's it for now.  (Note: (i) above is the most
important, but I think that it has further implications).

dkf added on 2004-07-05 01:22:58:

File Added - 92845: tfn.diff

Logged In: YES 
user_id=79902

Thanks Colin for spotting that!

The attached patch (against the HEAD and really just a minor
variation on what Colin suggests) causes two new failures in
filename.test (and no others), which appear to be (in part)
specifying the old behaviour. But that means we'll end up
detecting the change. :^)

We'll also want to add some tests that check the behaviour
detailed in the bug report, i.e. that [file exists
$someRealFile/] fails.
Reopening for maintainer assessment.

coldstore added on 2004-07-04 11:31:30:
Logged In: YES 
user_id=19214

I *know* it shouldn't be this simple, but:

generic/tclFileName.c line 768 reads:
    if (p[1] != '\0') {
if (needsSep) {
    *dest++ = '/';
}
    }

Should read:
if (needsSep) {
*dest++ = '/';
}

fixes the problem under unix and seems to pass cmdAH.test

Could it be as simple as the fact that we explicitly decided
to remove trailing slashes?  What other regression/other
tests need to pass?

vincentdarley added on 2004-07-03 23:39:20:
Logged In: YES 
user_id=32170

Fair enough -- it might well be a good idea to change this.
Having said that, from my knowledge of the filesystem code,
it's not a five-minute task.  Assuming the low-level
OS-routines behave nicely wrt to a trailing slash (and treat
such a thing as a directory as the bug reporter would like),
then it shouldn't be that difficult. If the OS-routines
don't behave in that way, then this is a big task.

I'm more than happy to test any patches to accomplish this
functionality.

dkf added on 2004-07-01 22:26:11:
Logged In: YES 
user_id=79902

Hmm.  8.0 has the same behaviour as 8.4, but this differs
from external existence testing schemes (e.g. 'test -e file'
in Unix shells).  Even if we've always done things this way,
it's probably not a good idea to keep on doing them this way
because it is minimally "surprising"... :^/

dgp added on 2004-07-01 21:08:53:
Logged In: YES 
user_id=80530

How far back in history does
this behavior go?  What did
Tcl 7 do, for example?

Might be nice to call
this a bug and fix it.

vincentdarley added on 2004-06-30 21:44:42:
Logged In: YES 
user_id=32170

Clarified this behaviour in the filename.n man page.  Any
changes to this behaviour would require a TIP, given the
potential compatibility issues.

vincentdarley added on 2004-06-28 16:14:40:
Logged In: YES 
user_id=32170

I assume this has the same behaviour in Tcl 8.3?

The only thing we can really do here (ensuring backwards
compatibility) is to clarify these details in the 'filename'
man page.  Tcl will ignore any number of trailing '/'s.

coldstore added on 2004-06-13 12:30:25:
Logged In: YES 
user_id=19214

A further note: this probably wouldn't bother anyone in
normal usage but in virtual filesystem work there can be a
world of difference between a file "a", a directory "a" and
a file "a/".

Attachments: