Tcl Source Code: View Ticket

Ticket UUID:	3105247
Title:	-NaN?
Type:	Bug	Version:	obsolete: 8.6b1.1
Submitter:	fridolin	Created on:	2010-11-08 12:56:50
Subsystem:	48. Number Handling	Assigned To:	kennykb
Priority:	5 Medium	Severity:
Status:	Open	Last Modified:	2010-12-22 23:13:41
Resolution:	Invalid	Closed By:	sf-robot
		Closed on:	2010-12-15 17:20:04
Description:	An external function returns a Tcl_NewDouble-object with NaN. In Tcl this values is printed as -NaN. I think the minus '-' should be omitted.
User Comments:	kennykb added on 2010-12-22 23:13:41: There are platforms - I no longer have my notes about which, I suspect HP PA-RISC - where even copying a signalling NaN either raises a signal (if the exception is unmasked) or silently converts it to a quiet NaN (if the exception is masked). Even passing the NaN into Tcl_NewDoubleObj as a parameter is enough to trigger the behaviour. The patch as attached does nothing about that. It simply cannot. What would really be needed for sNaN is a way to handle this right at the point of origin. That would involve either having Tcl_NewDoubleObj and friends accept a POINTER to the double (so it never needs to enter a register) and an interpreter pointer (for error message reporting). That would allow sNaN to be reported as an error before it ever enters a Tcl_Obj. (Of course, Tcl_GetDoubleFromObj would return an error for qNaN, but leave the NaN in place, just as it does now,) But - why bother? Tcl's defined behavior should be that all NaNs are signalling when presented to [expr] or to Tcl_GetDoubleFromObj. The hardware-level sNaN is not a value in Tcl's value calculus. sNaN presented to Tcl_{New,Set}DoubleObj is quieted if the hardware settings are such that it can arrive there without a crash - recalling that in the arithmetic contexts of Tcl, it will continue to raise a Tcl-level exception. I don't see a convincing reason for attempting to implement full-up IEEE exception control in Tcl proper, particularly in light of the fact that few C compilers can do anything sensible with FPE. All that I wanted when I did this code originally was for C code to be able to call Tcl_NewDoubleObj without needing to jump through isnan() hoops to avoid crashing the process. I really don't see any use case for bit-for-bit preservation of sNaN, and I do foresee tremendous maintenance headaches if we start to allow it. sNaN crashes the process as soon as it leaks into the FPU, and maintenance changes on Tcl's floating-point pipeline are likely to introduce such leaks from time to time. nijtmans added on 2010-12-22 22:06:04: Great! After some more experimenting, here is a patch (snan_2.patch) which moves the check to Tcl_(Set\|New)DoubleObj, as you suggest. I think that's a nice solution: - It makes it impossible to get an snan in a floating point register: it is just handled as a string and will generate an exception as soon as an attempt is made to handle it as a float. - roundrtip is impossible for snan's, a nice error message is generated. Exception masking is out-of-scope here, we woulsn't make it for Tcl 8.6 anyway Please, evaluate. various test cases are added. Regards, Jan Nijtmans nijtmans added on 2010-12-22 22:02:08: File Added - 396736: snan_2.patch kennykb added on 2010-12-17 23:37:38: Putting signalling-NaN detection into [binary scan] is starting to get awfully far afield of the original problem. Note that the standard that you quote mentions NaN (with an optional preceding sign) - so the original complaint about -NaN was complaining about standard-compliant behavior. In practice, what happens in the implementation is that the 'invalid' exception is signalled and masked. (You need to read the description of exception masking as well.) Masking some of the IEEE-754 exceptions is routine - in fact, I can hardly imagine what good could ever come of unmasking 'inexact'. It's pretty routine to mask exponent underflow and total-loss-of-significance, and not uncommon to mask the others. In any case, the case that I'm most worried about is the original reporter's use case, where the offending NaN came from Tcl_NewDoubleObj or Tcl_SetDoubleObj. Since those contexts don't have an interpreter in which to signal, we have to do something with them, and the choices appear to be to mutate the value, to crash the process, or to make Tcl's entire floating-point pipeline sNaN-tolerant. None of these is ideal. I've quoted Gene Golub in the past on his observation that these events are called 'exceptions' because, no matter what action is chosen, someone will take exception with it. nijtmans added on 2010-12-17 22:06:42: B.T.W., test binary-64.2 is the only test case which uses (unintentionally) a signalling nan. This is the only test case affected by this patch. nijtmans added on 2010-12-17 21:59:44: File Added - 396354: snan.patch nijtmans added on 2010-12-17 21:59:06: Well, I was thinking more in the lines of adapting the "binary scan" command to signal an exception as soon as an sNaN is used. Like "snan.patch", as attached here. ;-) This way it really becomes impossible to create an snan in Tcl. "binary scan" converts the binary representation to string (since EIAS in Tcl). According to IEEE 754, we should signal an exception then: ============================================= Conversion of a signaling NaN in internal format to an external character sequence should produce a language-defined one of "snan" or "nan" or a sequence that is equivalent except for case, with an optional preceding sign. If the conversion of a signaling NaN produces "nan" or a sequence that is equivalent except for case, with an optional preceding sign, then the invalid exception should be signaled. ========================================== TODO: what should the error message look like? TODO: how about __hppa? kennykb added on 2010-12-16 00:35:54: That gets really awkward really fast. Even loading a signalling NaN into a floating point register to pass it as a parameter or result value of a function causes a process crash on some platforms, so adopting a 'signalling NaN-safe' pipeline would be a lot of work (make sure all doubles that could be signalling NaNs are exchanged by pointers to the memory that stores them and operated on only by integer operations, at least until the possibility of a signal has been eliminated). Masking the exception on x86 causes the NaN silently to be converted to a quiet one when it's loaded, so passing a signalling NaN to Tcl_PrintDouble will get one of two results: a crash if the exception is unmasked, or a quiet NaN if it is masked. It seems to me that sNaN handling would yield comparatively little benefit for considerable development and maintenance cost. sf-robot added on 2010-12-16 00:20:04: This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). nijtmans added on 2010-12-09 16:02:21: So, Tcl is broken already? I repeat my last question: Should - in stead - Tcl start to represent signalling NaN's as "sNaN" (as allowed, but not required by IEEE 754) ??? I don't think this threatens the EIAS principle, as there is only one NaN (or ... two: signalling and quiet), as clearly indicated by IEEE 754. -NaN and NaN can be considered two different string representations of the same thing, just as 0 and 0x0 are, just as +NaN is the same as NaN. ferrieux added on 2010-12-09 15:54:58: I admit I've not followed this closely, but IMHO the answer to "Is the round-trip with shimmering important?" is: Yes !!! Any breakage of the EIAS principle, however tiny, threatens the whole castle. nijtmans added on 2010-12-09 15:13:16: File Added - 395655: 3105247_2.patch nijtmans added on 2010-12-09 15:08:43: dkf wrote: >we wouldn't care except we want to keep the general interconvertibility of >doubles with strings so that we can pass them back out again even in the >face of shimmering (e.g., if some nitwit uses [llength] on them, it >shouldn't break the round trip rules). Just found out that that currently is not the case: Tcl does not have signaling NaN's, they are silently converted to quiet NaN's. This is even mentioned in the TIP. Here is a test case showing that: test binary-63.5 {NaN roundtrip} ieeeFloatingPoint { binary scan [binary format w 0x7ff0000000000001] q d binary scan [binary format q $d] w w llength $d; #force shimmering binary scan [binary format q $d] w r list $d [format 0x%016lx $w] [format 0x%016lx $r] } {NaN(1) 0x7ff0000000000001 0x7ff8000000000001} So, in a normal roundtrip, the given bit pattern is converted to NaN(1), and back with the same result. When shimmering, bit 51 is magically set, changing the signaling NaN to a quiet NaN. This opens up (in my opion) to handle the sign bit the same way as the signaling bit: Provide we decide that the sign bit has no meaning, just as the signaling bit. I don't think this is in any way in conflict with TIP #249, nor with IEEE 754, but I would like to hear other opinions about this. Is the round-trip with shimmering important? Should - in stead - Tcl start to represent signalling NaN's as "sNaN" (as allowed, but not required by IEEE 754) ??? Here is my last patch (I hope) still open for discussion. nijtmans added on 2010-12-02 18:02:03: >Setting status to "pending - invalid" Well, I already spent too much time on this, and forced the same to some other people. Sorry about that. Here my conclusion, for the interested. For more information, See: <http://www.validlab.com/754R/drafts/archive/2006-10-04.pdf> After reading this (and the wiki about it), I come to the same conclusion: >A NaN may also carry a payload, intended for diagnostic information indicating >the source of the NaN. The sign of a NaN has no meaning, but it may >be predictable in some circumstances. ... >Conversion of a quiet NaN in internal format to an external character sequence >shall produce a language defined one of “nan” or a sequence that is equivalent >except for case (e.g., “NaN”), with an optional preceding sign. >Languages should provide an optional conversion of NaNs in internal format to >external character sequences that appends to the basic NaN character sequences >a suffix that can represent the NaN payload (see 8.2). The form and interpretation >of the payload suffix is language defined. The language should require that any such >optional output sequences be recognized as input in conversion of external character >sequences to internal formats. So, there is only one NaN, the extra bits (payload) and the sign are optional. External software should not generate a negative sign, since it has no meaning it should have used the payload for that. But the parser is required to accept an optional payload as valid NaN! This means, the Tcl numeric parser is conforming to IEEE754. Changing that, would interpret the sign as part of the payload, which is not a good idea. Regards, Jan Nijtmans kennykb added on 2010-12-01 23:48:31: Setting status to "pending - invalid" - if nobody comments before the bug is closed automatically, I'll remove the 'controversial' condition from the proposed test cases. nijtmans added on 2010-11-30 22:18:23: File Deleted - 394819: nijtmans added on 2010-11-30 20:06:17: File Added - 394819: 3105247_parser.patch hobbs added on 2010-11-30 00:26:39: It's not clear the value in changing the existing coded behavior, as it clearly works to separate 2 binary reps. What value would there be in changing what that rep would be to the OP? Go from -NaN to NaN(-ff..)? dkf added on 2010-11-29 23:37:35: IMO, there's two cases to consider: * externally-sourced NaNs * internally-sourced NaNs In the former case, they should survive a trip in (via [binary scan]) and out again (via [binary format]) without any change to their bit pattern, and while in Tcl should be understood to work like a NaN by being not numerically-equal to anything else; like that, if we get one from outside because of a blunder elsewhere, we can just pass it on correctly. In the latter case, we should endeavor to ensure that they are trapped as errors at as early an opportunity as possible; this is indeed the case with NaN values that pass through [expr]. Outside expressions, they're just strings. Overall, that means neither Tcl_NewDoubleObj nor Tcl_GetDoubleFromObj has to not choke on them but Tcl_ExprObj must not like them, which is current behavior. The actual string format of NaN-values isn't nearly so important; we wouldn't care except we want to keep the general interconvertibility of doubles with strings so that we can pass them back out again even in the face of shimmering (e.g., if some nitwit uses [llength] on them, it shouldn't break the round trip rules). Given all the above, I don't care too much about changing things in this area and I trust kbk to get things right. I particularly trust him to deal with corrections to the TIP, since I believe that considerations w.r.t. the sign of NaN were not foremost in our minds when the decision was taken! kennykb added on 2010-11-29 20:50:42: Oh, also: "matching against it is easier"... I usually use if {$x != $x} { ... } as the test for NaN. Simpler than any string match and it doesn't shimmer. kennykb added on 2010-11-29 20:47:53: Why do we have to change it? The existing behaviour is not new - it's been in the code (and test suite - in binary.test) since before 8.5.0. The tests in util.test, it turns out, are a duplication of the ones in binary.test (They should be in both places - the one is checking that the result is formatted correctly, the other that [binary] doesn't error out returning it.) The format used by the code for non-canonical (I won't say "non-conforming": IEEE754 provides for them) NaN's exactly follows that used by David Gay's 'dtoa.c' cited in the TIP as the source of the algorithms. (The code was rewritten because it was not up to Tcl's usual coding standard.) It isn't something that the Tcl team invented. I don't understand the motivation for changing it now, years after release, to something that is a local invention, simply because the TIP was inadvertently silent on the treatment of the sign bit of a non-canonical NaN. (Which was an editorial error, I assure you: I wrote the original draft of the TIP and simply forgot to mention the sign bit.) nijtmans added on 2010-11-29 17:02:49: OK, thanks! >it's probably unfixable until 9.0 Maybe, maybe not. Anything can be fixed when the TCT agrees about it. The way I understand it is that it is important for you (== kevin), that different internal representations lead to different string representations. How about changing the string representation of "-NaN" to "NaN(-ffffffffffffffff)". Then it is immediately clear that we are dealing with a non-conforming NaN here. Advantage: the confusion that lead to this issue is gone, and matching against it is easier (just "NaN" suffices). It would be a potential incompatibility for people using negative NaN's, and the parser needs to be extended to handle an additinal sign character. But it's a relatively small change. This could be an intermediate solution towards 9.0, if no-one complains about it, we know whether it is safe to remove the '-'-sign altogether. I'll see if I can come up with a patch, as a base for further discussion. Again, Thanks! kennykb added on 2010-11-29 06:11:20: My opinion is that the omission of the minus sign from the discussion in TIP 247 was an editorial error in the TIP. And these tests are actually changing nothing: in fact, the tests in binary.test (binary-63. and binary-64.*) were already also testing handling of NaN, and at least one -NaN appears in those tests as well. Nevertheless, since there's considerable controversy about this, I think it's probably unfixable until 9.0, because of course any editorial mistake in a TIP has to have a corresponding bug in the software if the mistake is not caught until general release. And this now can't be changed without introducing a script-level incompatibility. nijtmans added on 2010-11-10 18:10:44: File Added - 392784: 3105247.patch nijtmans added on 2010-11-10 18:10:02: File Deleted - 392676: nijtmans added on 2010-11-10 18:09:19: B.T.W. I am NOT suggesting to change the parser mentioned in TIP #249 (realizing that someone reading my previous comment might deduce that). TIP #132 mentiones: 10. The number parser detailed in TIP #249 will be adopted into the Tcl internals. See TIP #249 for details on the implications. So we all voted for that and it's accepted. My prevous patch modified the tostring output, adding a "-" there eventually, but that doesn't match the TIP #249 description, so it is clearly wrong. Here is a new patch, open for discussion! There are only 3 tests which make an assumption about the NaN string representation. nijtmans added on 2010-11-10 16:46:15: OK, I'll explain why I think that Christoph is right. The relevant TIP here is TIP #249: Unification of Tcl's Parsing of Numbers The implentation came from the kennykb-numerics-branch, which came with Tcl 8.5: 2005-05-10 Kevin Kenny <[email protected]> Merged all changes on kennykb-numerics-branch back into the HEAD. TIP's 132 and 232 are now Final. This TIP #249 states: 8. The constants, "Inf", and "Infinity" (perhaps with a leading signum) are interpreted as infinities. Infinity is represented as tclDoubleType. 9. The constant "NaN" is the IEEE "Not a Number" value. It is specifically permitted in the parser so that binary format q NaN and similar calls can produce NaN on an external medium. The presence of NaN in expressions, or in Tcl_GetDoubleFromObj, signals an error. NaN is represented as tclDoubleType. 10. IEEE floating point does not have a single unique NaN value, so a NaN may be augmented by a parenthesized string of hexadecimal digits, which will be stored in its least significant bits. It shall not be possible to construct signalling NaN by this route; only quiet NaN will be supported. NaN is represented as tclDoubleType. Nothing wrong here, except that nowhere is mentioned that NaN's may be negative as well as Inf. The only place it can be derived that NaN's may be negative is the state diagram, but I would argue that in this respect, the state diagram does not match the text. How does libc it? Some C-code: double d; memcpy(&d, "\000\000\000\000\000\000\370\177", 8); printf("%g\n", d); memcpy(&d, "\000\000\000\000\000\000\370\377", 8); printf("%g\n", d); memcpy(&d, "\001\000\000\000\000\000\370\177", 8); printf("%g\n", d); memcpy(&d, "\001\000\000\000\000\000\370\377", 8); printf("%g\n", d); On Windows XP (mingw), this prints: 1.#QNAN -1.#QNAN 1.#QNAN -1.#QNAN On Linux (Ubuntu 10.4, AMD64) nan -nan nan -nan In Java Double d = Double.longBitsToDouble(0x7ff8000000000000L); System.out.println(d); d = Double.longBitsToDouble(0xfff8000000000000L); System.out.println(d); d = Double.longBitsToDouble(0x7ff8000000000001L); System.out.println(d); d = Double.longBitsToDouble(0xfff8000000000001L); System.out.println(d); This prints NaN NaN NaN NaN So, what should we follow? The C way, which is non-portable (as in Tcl 8.4). Or the Java way, which doesn't have a -NaN? Currently we use a mixture, which is not well described in TIP #249 More opinions? nijtmans added on 2010-11-09 21:14:41: >JAN PLEASE DO NOT GO FIXING BUGS IN OTHER PEOPLE'S AREAS WITHOUT >COMMUNICATING!!! That's why I only added an implicit test-case, and using this issue for comminucation. Looks like you are the right candidate for making a decision about this. kennykb added on 2010-11-09 21:11:37: NOT SO! NaN and -NaN are intentionally printed separately. The former is 7ffffffffffffff, the latter is ffffffffffffffff. We also print NaN(hexstring) for 7ff0000000000001 - 7ffffffffffffffe and -NaN(hexstring) for fff0000000000001 - fffffffffffffffe I don't believe this is a bug. JAN PLEASE DO NOT GO FIXING BUGS IN OTHER PEOPLE'S AREAS WITHOUT COMMUNICATING!!! nijtmans added on 2010-11-09 20:50:42: After some experimenting, I added a testcase (binary-40.5) which reproduces this. I think that Christoph is right. Here is my proposed fix (moving the minus sign between brackets after the "Nan", and only displaying it for 'invalid' NaN's) nijtmans added on 2010-11-09 20:47:50: File Added - 392676: 3105247.patch dgp added on 2010-11-08 22:01:08: What bit pattern is at issue? (There are many 32-bit patterns within the "NaN" collection of values). What are the platform details?

Attachments:

snan_2.patch [download] added by nijtmans on 2010-12-22 22:02:08. [details]
snan.patch [download] added by nijtmans on 2010-12-17 21:59:44. [details]
3105247_2.patch [download] added by nijtmans on 2010-12-09 15:13:15. [details]
3105247.patch [download] added by nijtmans on 2010-11-10 18:10:44. [details]