Tcl Source Code

View Ticket
Login
Ticket UUID: 79614fb8b61983ac8ef30ea8752c310465798fc7
Title: [glob], [encoding system] and encoding-free filesystems
Type: Bug Version: 8.6.7
Submitter: pooryorick Created on: 2016-11-07 23:03:09
Subsystem: 16. Commands A-H Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2017-03-23 13:48:31
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2017-03-23 13:48:31
Description:

in the following example, on a Linux system, where filesystems are encoding-free, the file is only opened when the Tcl system encoding is iso8859-1. Furthermore, when the line

set name $directory/[lindex $names 0]

is moved out of the loop, the file is not successfully opened at all.

set sysencoding_orig [encoding system]

set name0 "funky \xa2 name"

# Set up for the demonstration of the bug by creating a file whose name is
# exactly the bytes in the string below
close [open $name0 wb]

set directory [pwd]
set names [glob -directory $directory -tails *]
foreach encoding [list $sysencoding_orig iso8859-1] {
	encoding system $encoding 
	set name $directory/[lindex $names 0]
	puts [list {globbed name identical?} [expr {$name0 eq $name}]]
	try {
		set chan [open $name]
		encoding system $sysencoding_orig
		puts [list {opened with system encoding} $encoding]
		close $chan
	} on error {cres copts} {
		puts [list {failed to open with system encoding} $encoding]
	}
}
encoding system $sysencoding_orig

User Comments: jan.nijtmans added on 2017-03-23 13:48:31:
Since the VFS cache update is fixed now in all branches, I consider this 'bug' fixed. In case there are objections, please feel free to re-open this Issue and/or place your further remarks here.

Anyway, setting the "system encoding" to anything other than the real "system encoding" (indeed ... Filesystems aren't encoding-free at all ...) is asking for trouble. I fully agree with dkf's remarks.

dkf added on 2016-11-14 10:52:21:

Filesystems aren't encoding-free as files have names expressed as byte sequences that represent strings, but they (usually) don't have an explicitly defined one. The system encoding is what should be used for filenames by convention, but it is possible to end up with some pretty weird cases where the encoding on one mounted filesystem is different to another one; this is a problem that can really bedevil you if you've got old removable drives, USB sticks, etc. Because of that, it's sometimes useful to be able to change what the system encoding is. (In particular, iso8859-1 is vital because that at least lets you work with the real bytes; it's a feature of that specific encoding that it maps to the first 256 unicode characters.)


jan.nijtmans added on 2016-11-11 11:04:05:
Well, at least, not invalidating the VFS cached information when changing the "system encoding" is a bug, fixed now in core-8-5-branch, core-8-6-branch and trunk.

Whether the first mentioned "feature" is a bug, I'm not sure. If the "system coding" doesn't match the real system encoding, it's not surprising that there are problems opening some files. Maybe there is something additional (UNIX-specific) going on which causes the system encoding detection not doing what you expect. Feel free to comment on that, I'll keep this issue open for a while.

Thanks for the report!

jan.nijtmans added on 2016-11-11 09:47:52:
Having a look at this one. At least the second remark (moving the "set name" out of the loop makes a difference) looks like a bug, related to the internal caching of filenames. If the system encoding changes, I think all of such cache information should become invalid.