Tcl Source Code

View Ticket
Login
Ticket UUID: baf43873720f5e92ae1142b1fc8d343b3648b54d
Title: file join trashes valid network paths
Type: Bug Version: >= 8.5 (*nix only)
Submitter: bll Created on: 2018-07-27 01:18:54
Subsystem: 16. Commands A-H Assigned To: dgp
Priority: 5 Medium Severity: Severe
Status: Open Last Modified: 2018-07-27 20:55:35
Resolution: None Closed By: nobody
    Closed on: 2018-07-27 15:18:10
Description:
Reference: https://groups.google.com/d/msg/comp.lang.tcl/SqXhSGqGEWc/UdwXhExCBwAJ

Reference: (from the wiki): AMG: In Tcl 8.6.8, [file join //a/b] returns //a/b,
 but in Tcl 8.7a1, [file join //a/b] returns /a/b. This got me in trouble
 because I was trying to work with Windows UNC paths. In the end I just had to
 concatenate strings and forgo [file join]. [file nativename] worked right, at
 least.

file join on a valid windows network path will change the leading double slashes to a single slash.

file join should not trash a valid path.
User Comments: bll added on 2018-07-27 20:55:35:
Oh, I did understand.  Sorry about the confusion.

I don't think a // on unix should be changed in the path...
but
I also don't think any unix treats // as a special case.

At this point, I think the 8.6.8 implementation is fine.

AMG's note was specific to 8.7a1, I don't know if a bug was
introduced, but the tests should hopefully catch that.

I do think the documentation should be updated so that there is an explanation
of how [file join] treats network paths on windows.  And perhaps some 
explanation of the differences between windows and unix.

file join //host location file 
vs.
file join //host/location file

sebres added on 2018-07-27 19:51:59:

> Then again [file join C: /a] on unix also destroys the drive path.

Sure, but you did not understand my example - I meant on unix c:/... is not absolute,
so in case [file join foo c:/bar/$tcl_platform(platform)] it DOES NOT overwrite foo, but will be appended as relative path. Take a look on my example again.

This way it is definitely different as on windows (just as an argument against the UNC).

As regards the provided link, I saw it already in the newest edition (issue 7, 2018 edition):

A pathname consisting of a single <slash> shall resolve to the root directory of the process. 
A null pathname shall not be successfully resolved.
If a pathname begins with two successive <slash> characters, the first component following 
the leading <slash> characters may be interpreted in an implementation-defined manner,
although more than two leading <slash> characters shall be treated as a single <slash> character.

And in your opinion is "may be interpreted in an implementation-defined manner" the same as "the POSIX specification disagrees"?
In my opinion this sentence allows implementation to do what it wants.
Additionally it's going to pathname resolution (not about the join), and anyway not about the file-subsystem, to where the `file` ensemble does belong.

I could be wrong, but I'm sure this will not convince TCT to let us rewrite the handling to support UNC-paths.


bll added on 2018-07-27 18:07:17:
Link is here, last paragraph:

http://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap04.html#tag_04_11

What I don't know, since //something/somepath/somefile is valid on unix,
is the specification like windows, where //something/somepath is a single
network path or is //something a stand-alone value that is valid.

I do not know what would be proper there.

I think I would make the same rule as for windows.
//something/somepath would need to be a single unit and the
leading // is kept intact.  This would preserve any UNC path
processing on a unix server.

Then again [file join C: /a]
on unix also destroys the drive path.

sebres added on 2018-07-27 17:15:44:

As regards your latest comment (windows also), this was never possible using such kind of join (also on previous versions), because {\\myprinter} as well as {//myprinter} are not valid UNC-path (and not really valid share network name), so

% file join //myprinter/myqueue myfile
//myprinter/myqueue/myfile

but

% file join //myprinter myqueue myfile
/myprinter/myqueue/myfile

As regards the "putting together UNC pathnames to return to windows for processing" - yes, but why only windows? Let us then accept any file-system of the world (including virtual). The command is called `file join` and join path segments of the *nix-platform correctly.

Back to the issue, actually I'm not against this (because myself also working multi-platform and the artificial case [f34cf83dd0] is even fewer interesting for me, IMHO it was not really a bug).

But the fact is (if I understood the discussion correctly) - no one knows now how it is right, and I don't think someone want to revert the handling back to version before 8.6.7.

So if you meant "the POSIX specification disagrees", please provide a link or give me the quote referencing this.

And of course I can reopen it, but ATM I do not see the good prospects of success.


bll added on 2018-07-27 15:18:10:
Please re-open.
Please read my latest notes.
The problem exists on windows.
The POSIX specification disagrees.

sebres added on 2018-07-27 15:10:46:

The conclusion is - not a bug but feature.

Tcl will not support an UNC-path on *nix-platform, because:

  • it is not native for the file-subsystem of this platform
  • exactly the same manner as the other windows conventions for absolute path (like c:/) are not supported

So just compare this for both platforms:

% file join foo c:/bar/$tcl_platform(platform)
c:/bar/windows
foo/c:/bar/unix

The versions are: 8.6.7, 8.6.8, 8.7a1 and above. Possibly the documentation should still get a notice about the handling provided in [2158eea530].


bll added on 2018-07-27 15:02:15:
Also, it would be possible for some file server running on unix to be
putting together UNC pathnames to return to windows for processing.
There should not be a unix/windows separation disjunction here.

Windows 7-64, Tcl 8.6.8, confirmed issue
% set printer \\\\myprinter
\\myprinter
% set queue myqueue
myqueue
% set file myfile
myfile
% puts [file join $printer $queue $file]
/myprinter/myqueue/myfile
% set printer //myprinter
//myprinter
% puts [file join $printer $queue $file]
/myprinter/myqueue/myfile
% file normalize //myprinter/myqueue/myfile
//myprinter/myqueue/myfile
%

Removing the *nix only.

bll added on 2018-07-27 14:52:28:
I would answer (a) for all three.

The poster on comp.lang.tcl had on windows 7, 8.6.3:
 > set printer \\\\192.168.1.171
 > puts [file join $printer queue doc.ps]
/192.168.1.171/queue/doc.ps

So windows may have issues also.


From the IEEE Std 1003.1 POSIX Specification:

A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.

( http://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap04.html#tag_04_11 )

sebres added on 2018-07-27 11:03:10:

Well, I found what it is, belong to solution for [f34cf83dd0], first time introduced in [2158eea530] (and merged in all major branches).

@Don: are you sure, this should be fixed so incompatible way as regards the UNC-pathes?

Where is the TIP? (allow me to do this tiny joke ;)

The simple test-case filename-7.19 (added in [f49a421a0d]) is IMHO questionable also:

test filename-7.19 {[Bug f34cf83dd0]} {
    file join foo //bar
} /bar

Either TCL continue to support UNC platform-independent, or //bar is indeed simple absolute root-path (on *nix-platform) and the test is correct then.

So I would be interested to know the answer of the questions as regards the UNC-paths and segments:

1. file normalize //a/b
a. //a/b
b. /a/b

2. file normalize //a//b a. //a/b b. /a/b

3. file join //a/b //c/d a. //c/d b. /c/d


sebres added on 2018-07-27 09:57:24:

Hmm... even weirder - also on previous versions, where it does not affect already "normalized" path, this removes slashes (including first) if one additional slash presents in the second path-segment (path was not normalized).

$ echo 'puts [file normalize //a/b/$tcl_platform(platform)]' | ./tclsh.sh
//a/b/unix
$ echo 'puts [file normalize //a//b/$tcl_platform(platform)]' | ./tclsh.sh
/a/b/unix
$ echo 'puts [file normalize //a/b//$tcl_platform(platform)]' | ./tclsh.sh
/a/b/unix


sebres added on 2018-07-27 09:45:57:

The problem here is not the join but normalize (invoked internally). Additionally it is still correct on windows (so issue affects unix only).

% file normalize //a/b/$tcl_platform(platform)
//a/b/windows
/a/b/unix

Additionally it affects all current versions (on *nix) since 8.5.

I'm not sure the issue is really an issue (because UNC-paths are not accessible directly on *nix platform, so are not really valid in sense of *nix path). Either one uses something like `mount -t drvfs '\\server\share' /mnt/share`, or different named conventions like `file://server/share`.

But to be platform-independent (e. g. build the path-segments) the current behavior looks wrong to me.

And indeed 8.6.8 has still remained first backslash, current 8.6 does not (as well as 8.5 also).

WiP.