Tcl Source Code

View Ticket
Login
Ticket UUID: 3e25ac512e9eeb8788028264c400238fcf425238
Title: [file] inconsistent handling of VFS/..
Type: Bug Version: >= 8.5
Submitter: andy Created on: 2018-06-14 16:57:08
Subsystem: 37. File System Assigned To: nobody
Priority: 5 Medium Severity: Severe
Status: Pending Last Modified: 2018-06-29 09:47:10
Resolution: Fixed Closed By: nobody
    Closed on:
Description:
The [file] commands that talk to the OS (e.g. [file mtime]) start off being able to handle paths which enter and leave a VFS mount (e.g. /path/mounted-tclkit/..), but this capability seems to break once a relative path is used.

$ ./tclkit
% file mtime [file join [info nameofexecutable] ..]
1528994647
% file mtime .
1528994647
% file mtime [file join [info nameofexecutable] ..]
could not read "/home/andy/tclkit/..": not a directory
User Comments: sebres added on 2018-06-29 09:47:10:
As regards the issue with validity of the cached values, I think it could be
pretty simply resolved using introducing of the epoch for the current directory
(and comparison with the stored epoch in pathObj).

I could fix this all, but I'm still not sure about the precedence of normalization 
(I don't think the current directory should decide about the time point of normalization at all).

dgp added on 2018-06-28 19:23:16:
Fourth, it sure looks like the validity of cached nativePathPtr values
are not properly checked, so that cached values get used when no longer
valid.

dgp added on 2018-06-28 19:21:20:
There seems to be several problems lurking here and untangling them
all is going to be a knotty problem.

The first looks like something already found and addressed by sebres,
the routine TclFSCwdIsNative() isn't written to be self-initializing,
so it has bizarre shifts in behavior when run both before and after something
like a [pwd] or anything else that gets Tcl's "current working directory"
machinery up and properly running.

Second, the unix implementation of TclNativeCreateNativeRep() branches
on the return value of TclFSCwdIsNative(). It looks like this is an attempt
to detect and optimize for the case when a relative path value can be
converted directly into native form. The trouble is the logic is not limited
to relative paths, so you have a situation where absolute paths sometimes
take one branch, and sometimes another with the possibilty of inconsistent
results.

Third, when paths are normalized before getting converted to native form,
and they include the "/.." substring, the answer is just wrong as reported
separately in ticket 961646.

There's probably more.

dgp added on 2018-06-27 17:12:07:
Divergence is in Tcl_FSGetNativePath(pathPtr) returning different
answers from pathPtr arguments of the same value.

dgp added on 2018-06-27 16:46:35:
% set p [info nameofexecutable]/..
/local/tmp/dgp/fossil/tcl8.5/unix/tclsh/..
% file mtime $p
1529941665
% pwd
/local/tmp/dgp/fossil/tcl8.5/unix
% file mtime $p
1529941665
% file mtime [string range $p 0 end]
could not read "/local/tmp/dgp/fossil/tcl8.5/unix/tclsh/..": not a directory

dgp added on 2018-06-27 16:00:11:
see also: https://core.tcl-lang.org/tcl/tktview?name=961646

sebres added on 2018-06-15 16:13:34:

I'll try to explain a bit about the precedence question:

The fix resolves currently only the discrepancy for paths between first-usage and after pwd-call.

But the issue remains. For example what will happen if we change current directory to some virtual (non-native) file system?
In this case the normalization takes precedence over the file system calls, also for the other paths (resp. segments) that are not affected by the current directory at all.

This is still wrong (at least inconsistent) and should be resolved.


sebres added on 2018-06-15 15:45:57:

Fixed in [188107cdc4846883]. It is a minimalist fix for this bug to avoid different values before and after current directory access (initialize on demand now).

But the question of the priority of path normalization still remains in my opinion.


aspect added on 2018-06-15 15:14:34:
As for what should happen, my opinion is that:

% file exists /bin/sh/..
0

is correct.  This is in line with shell:

$ [ -e /bin/sh/.. ] && echo true || echo false
false

.. or C: access("/bin/sh/..", F_OK) == 1 .. or other scripting languages.


The tclkit case is special:  if tclkit mounts a vfs at [info nameofexe] then:

% file isdirectory [info nameofexe]
1

hence:

% file exists [info nameofexe]/..
1

aspect added on 2018-06-15 15:08:26:
Chasing the difference between interactive and script mode:

$ echo 'puts [file exists /bin/sh/..]' > dotdot.tcl
$ tclsh dotdot.tcl 
0
$ tclsh < dotdot.tcl
1

the difference comes from TclFSCwdIsNative() returning 0 in the interactive case,
causing an extra round of normalization in TclNativeCreateNativeRep.

In the interactive case, tsdPtr->cwdClientData is null.

In the script case it is initialized by the call chain:

Tcl_MainEx -> Tcl_FSEvalFileEx -> Tcl_FSGetNormalizedPath -> Tcl_FSGetCwd -> FsUpdateCwd.

sebres added on 2018-06-15 14:13:28:

The same as regards the other file commands using TclpObjStat / TclOSstat (at least by the second call)...


% file exists [string trim " /etc/passwd/.. "]
1
% file exists [pwd]
1
% file exists [string trim " /etc/passwd/.. "]
0

Strange is if I extend TclpObjStat with path normalization (that normally should do always the same thing), it works?!


--- "a/./unix/tclUnixFile.c~0"
+++ "b/./unix/tclUnixFile.c"
@@ -842,7 +842,20 @@ TclpObjStat(
     if (path == NULL) {
 	return -1;
     } else {
-	return TclOSstat(path, bufPtr);
+	int rc = TclOSstat(path, bufPtr);
+	if (rc == -1
+    #ifdef ENOTDIR
+	   && errno == ENOTDIR
+    #endif
+	) {
+	    Tcl_Obj *normPtr = Tcl_FSGetNormalizedPath(NULL, pathPtr);
+	    if (normPtr == NULL) {
+		return -1;
+	    }
+	    path = Tcl_GetString(normPtr);
+	    rc = TclOSstat(path, bufPtr);
+	}
+	return rc;
     }
 }

So I assume that not only the file access resp. file system api selecter is affected, but also the precedence of the path normalization:

  • rather it should always firstly resolve the path-segments;
  • or the path-segments, that could be really accessed by corresponding file-system (and only hereafter the rest);
  • or it remains UB as now;

Currently we've a mix there, although very confusing mix.

So firstly I would like to know the expected precedence regarding path-segments and file access. Or answer to question: should [file dirname "/some/path"] point to the same path as "/some/path/..", and if not, which deviations there are?


sebres added on 2018-06-15 12:45:14:
affected *nix only, since 8.5

aspect added on 2018-06-15 08:47:05:
Not just vfs, and not just relative paths:

$ tclsh
% info patch
8.7a2
% file mtime [file join [info nameofexecutable] ..]
1524799287
% file mtime [pwd]
1524799287
% file mtime [file join [info nameofexecutable] ..]