Tcl Source Code

Artifact [5f98e7112d]
Login

Artifact 5f98e7112d002bbf95324a3cbab5ab54c5134893:

Attachment "[email protected]" to ticket [1972867fff] added by msofer 2008-05-26 20:09:41.
kbk	I don't know if it's bytecompilation... have you tried doing the bcc'ed one first?
miguel	ahh ... if I do the bcc'ed first, both behave the same - as the bcc'ed used to do (ie, the wrong one according to the testsuite)
kbk	I think it has to do possibly with losing the state of the shiftjis decoder.
kbk	Any idea what the 'old one found' comment is all about?
miguel	nope
kbk	And what's with [encoding system identity] twice (lines 10 and 23)?  Seems as if that ought to do precisely nothing, since system encoding is already 'identity'
kbk	Hmmm, is this Windows?
kbk	no, never mind, I see it's unix
kbk	The issue is literal lifetime and whether the shiftjis encoding can be found when [ encoding system] is identity.
kbk	3: Stash the current system encoding (to be able to restore it later)
kbk	4: Stash the encoding dirs (to be able to restor later)
kbk	5: Set the system encoding to shiftjis.
kbk	6: Set [encoding dirs] to the path to the wd, converted from whatever the true system encoding is - as if the system encoding were shiftjis. (The path had better be plain ASCII!)
miguel	two notes: (a) I do not grok the encoding stuff at all; (b) I am worried about different behaviour between interpreted and compiled (that should not happen, and I fail to see anything dangerous about it here: why does it?)
kbk	7: Try to convert u+4e4e to shiftjis.  This will load the shiftjis encoding, and may dismiss it again if the 'shiftjis' literal loses its int rep
kbk	Oh, sorry, 7. can't lose the shiftjis encoding 'coz the system encoding is holding a reference to it.
kbk	I still think that the problem is that shiftjis can't be found with [encoding system] and [encoding dirs] set the way they are... that's the true bug.
kbk	But then the bug is masked in one case and not the other by different literal lifetimes, where the encoding is cached in the int rep of the 'shiftjis' literal.
kbk	And this test looks thoroughly bogus - I see no reason to expect it to work with [encoding system] set wrong.
miguel	ahhh ... now I see where you are going
miguel	encoding-11.1 seems to be the same thing
kbk	Might need to replace the 'shiftjis' throughout with something like [join {shift jis} {}] to defeat int rep caching... and then report it as an encoding bug
kbk	(Probably as a bogus test, since using the filesystem with [encoding system] set incorrectly is a horribly, eye-poppingly, face-slappingly, bad thing to do.
kbk	)
miguel	not sure I'd know what to report (yet) - as I am not sure what is supposed to be happening
kbk	Doing so happens to work a good part of the time because most filesystem encodings have ASCII as a proper subset...
kbk	... but that is by no means guaranteed.
miguel	the filesystem usage is easy enough to bypass, right?
kbk	I don't think so; this test seems to be trying to force it to go to the filesystem (perhaps in order to test that it actually does)
miguel	set dirs [list [pwd]]
	encoding system shiftjis ;# incr ref count
	encoding dirs $dirs
kbk	Uhm, the [encoding dirs] is supposed to get rid of cached encodings, and then [encoding convertto] is supposed to be going to the filesystem to load the encoding table all over again.
kbk	I think.
miguel	what I thought should be happening is: as shiftjis is not to be found in [encoding dirs], the thing should fail. So, set [encoding dirs] to anything where the encoding is not to be found. [pwd] is the obvious cheap choice
kbk	And so it could be that encoding *also* has a bug wth its epoch management.
kbk	Ah... I was looking at *your* case, and hadn't pulled up encoding-2.2.
miguel	my case *is* encoding-2.2, copy/pasted
kbk	But I didn't know which of your two results was correct!
miguel	ok
kbk	Hmmm. I don't even know if encoding caches stuff on int reps, without looking.
kbk	But if it does, then it's got a bug with epoch management.
kbk	(And [pwd] misencoded is also a cheap 'not there', so I won't worry too much about all the [encoding system] stuff)
kbk	It appears that [encoding dirs] is changing the encoding path, but the 'shiftjis' encoding, presumably cached on the literal's internal rep, is persisting.
kbk	And it could be that it's simply a matter that this bug has been there for a long time, but that the literal is shared more widely than it was, or doesn't shimmer, or some such.
miguel	11.1 is a bit different; I also get a failure there ... but it might well be a side-effect of the previous one (shiftjis is in the literal for the interp's lifetime)
kbk	Yes, encodings do get cached in an obj's int rep.
kbk	And I don't see anything in tclEncoding.c to spoil the cache when [encoding dirs] changes.
kbk	I think the assumption may be that an encoding is fixed, and a change to [encoding dirs] won't affect any encoding that has been resolved successfully.
kbk	And that's not a terrible assumption, actually...
kbk	... but these tests seem to be trying to make sure that extant encodings are spoilt when the encoding path changes.
miguel	hmmm ... it seems to be relying on Tcl_DecrRefCount to do the spoiling, then? Which is not enough - the thing could be shared (as it is if it is a literal)
kbk	I think that either (a) we decide this is Not A Bug - the 'shiftjis' encoding should be the same everywhere provided that it is found at all, or (b) that we need an epoch number in the cache of encodings (just as we have in so many of our other caches)
kbk	But *that* part is for the maintainer of tclEncoding.c to sort out.
miguel	my take then is that I should ignore this test failure (in the sense that it is not a bug in bc'ing uplevel), and file an encoding bug. Right?
kbk	Right.  I suspect it's just that the bcc has gotten more efficient, and we're now persisting a useful int rep that we previously lost.
kbk	Does it succeed if you replace all the 'shiftjis' literals with [join {shift jis} {}] or some such?
miguel	no failure when using [join {shift jis} {}] ... caching hypothesis confirmed