Ticket UUID: | 402781 | |||
Title: | Fix for Bug 121072 | |||
Type: | Patch | Version: | None | |
Submitter: | dgp | Created on: | 2000-12-11 16:26:32 | |
Subsystem: | 69. Other | Assigned To: | dgp | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2016-08-22 12:35:31 | |
Resolution: | None | Closed By: | dkf | |
Closed on: | 2016-08-22 12:35:31 | |||
User Comments: |
dgp added on 2000-12-15 05:24:58:
Integrated to CVS HEAD. john_ousterhout added on 2000-12-15 01:25:16: In response to Jeff's comment, as far as I know Tcl has *never* run (well) on systems with 16-bit ints. In designing Tcl I always assumed that ints would have at least 32 bits; I bet many things would break if you tried to run with smaller ints. dgp added on 2000-12-14 04:13:38: Here's a script to illustrate the incompatibility: # FILE: randtest.tcl puts [expr srand([lindex $argv 0])] puts [expr rand()] puts [expr rand()] The shell tclsh8.3 is Tcl 8.3.2; the shell tclsh8.4 is the CVS HEAD revision of Tcl on Dec. 12, 2000, plus Patch 102781. On 32-bit SunOS 5.7: $ tclsh8.3 randtest.tcl 10000 0.0782636925943 0.377881431662 0.0532219503323 $ tclsh8.3 randtest.tcl 137438953472 integer value too large to represent while executing "expr srand([lindex $argv 0])" invoked from within "puts [expr srand([lindex $argv 0])]" (file "randtest.tcl" line 1) $ tclsh8.4 randtest.tcl 10000 0.0782636925943 0.377881431662 0.0532219503323 $ tclsh8.4 randtest.tcl 137438953472 integer value too large to represent while executing "expr srand([lindex $argv 0])" invoked from within "puts [expr srand([lindex $argv 0])]" (file "randtest.tcl" line 1) So that's compatible. Next, on Alpha w/ Gnu C compiler: $ tclsh8.3 randtest.tcl 10000 0.0782636925943 0.377881431662 0.0532219503323 $ tclsh8.3 randtest.tcl 137438953472 1.00050088856 0.418434093901 0.621816196768 $ tclsh8.4 randtest.tcl 10000 0.0782636925943 0.377881431662 0.0532219503323 $ tclsh8.4 randtest.tcl 137438953472 0.242578298898 0.0134695745136 0.383138850044 Incompatible, but it must be incompatible to fix the bug. Finally on Alpha with the Compaq C Compiler: $ tclsh8.3 randtest.tcl 10000 0.0782636925943 0.377881431662 0.0532219503323 $ tclsh8.3 randtest.tcl 137438953472 0.000500887632603 0.418418441163 0.358740620482 $ tclsh8.4 randtest.tcl 10000 0.0782636925943 0.377881431662 0.0532219503323 $ tclsh8.4 randtest.tcl 137438953472 0.242578298898 0.0134695745136 0.383138850044 Incompatible, even though there was no apparent bug in Tcl 8.3. The incompatibility bites only if someone has a script like this one where they input seeds >= 2^31, and expect the same seed to generate the same sequence in Tcl 8.4 that it did in Tcl 8.3 (maybe they're doing a long series of simulations, and in the middle their sysadmin upgrades the system's installed Tcl from Tcl 8.3 to Tcl 8.4). I think this should be noted in ChangeLog and changes, but isn't enough of a problem to prevent this fix going into Tcl 8.4, or even Tcl 8.3.3. I'm willing to patch the docs, too, if you think it is important enough. hobbs added on 2000-12-14 02:29:31: Regarding the minor incompatability... I believe you're saying that any seed fed into 8.4 will still generate the same random number sequence, be it on a 64bit or 32bit int system. However, the same seed fed into 8.4 on a 64bit system will not generate the same sequence for seeds > 2^31 as in 8.3. I think this is acceptable. We might patch the docs to mention it as an anecdote, just in case. hobbs added on 2000-12-14 02:26:37: Regarding the minor incompatability... I believe you're saying that any seed fed into 8.4 will still generate the same random number sequence, be it on a 64bit or 32bit int system. However, the same seed fed into 8.4 on a 64bit system will not generate the same sequence for seeds > 2^31 as in 8.3. I think this is acceptable. We might patch the docs to mention it as an anecdote, just in case. dgp added on 2000-12-13 08:53:17: Even without a convincing case that some compilers are broken, it is evident that they vary, so we need an implementation that works with them all. A 64-bit world would be a great reason, some day, to replace Tcl's "minimally acceptable" [*] rand() with a better one. For now, I'm satisfied with fixing this one. Besides, anyone doing serious statistics is, I hope, wise enough to know that Tcl's rand() is not up to the task. They should be producing and using extensions specialized for that purpose. There is a very minor (IMHO) incompatibility introduced by this patch. [expr srand($seed)] is used to initialize the random number generator so by using the same seed again and again, the programmer can generate the same pseudo-random sequence again and again. After this patch is applied, on those 64-bit platforms where Bug 121072 did not exhibit, and for values of $seed >= 2^31, the sequence generated after [expr srand($seed)] in Tcl 8.4 will not be the same sequence as was generated by the same $seed in Tcl 8.3. For other platforms, or values of $seed less than 2^31 on all platforms, the $seed -> sequence map should be unchanged. Without objection, I'll integrate on Dec. 14, 2000. [*] Press & Teukolsky hobbs added on 2000-12-13 07:49:15: Given Don's excellent analysis, I favor going with the larger patch. However, in reading K&R 2nd edition A6.5 "Arithmetic Conversions", I'm not seeing that you can decidedly say whether the gcc or Compaq camps are doing the "right thing". Note that gcc on Solaris works, whereas gcc on Alpha didn't. In any case, a more robust solution is always welcome. The only concern I have is that, in a world where all ints are 64 bits, do we want to limit the randseed to 31? dgp added on 2000-12-13 06:25:23: Following up on Jeff's claim that this bug is not present on all 64-bit systems after all, I built Tcl on an Alpha using both the Gnu C compiler, and the Compaq C Compiler http://www5.compaq.com/products/software/compilers/candcxx.html Sure enough, the Compaq C Compiler produces a Tcl library that does not exhibit Bug 121072. The Tcl library compiled by the GNU C Compiler does suffer from Bug 121072. The difference comes down to the compilers' interpretation of line 4099 of revision 1.17 of generic/tclExecute.c. That line includes the expression iPtr->randSeed - tmp*RAND_IQ where iPtr->randSeed is a long, tmp is an int, and RAND_IQ is the literal 127773. The Compaq compiler promotes all three quantities to longs before computing the expression. The GNU compiler, however, sees the product of tmp (an int) and 127773 (an int constant), and computes the multiply using ints, which can overflow (and does overflow in the Bug 121072 example). This difference causes the two compilers to produce code that compute different results. The strange thing is that by my reading of K&R's ANSI C book, it is the GNU Compiler that is behaving correctly -- the product of two ints should be stored in an int -- and Tcl's code is depending on the broken behavior of the Compaq compiler (and others?). A colleague reports that SGI's C compiler with the -64 option also works correctly, so a Tcl library built on SGI with the -64 option also exhibits Bug 121072. Isn't it fun working around compiler bugs! :( [**] The tiniest fix for Bug 121072 is to declare tmp to be of type long, rather than type int, and leave everything else alone. That will fix the problem. So, if you prefer a minimum-code-changes approach, we could just do that. However, Patch 102781 provides a more substantial rewrite. It avoids the overflow problem entirely by keeping Tcl's implementation within the constraints of the algorithm described in the Press & Teukolsky reference. After all, the P&T algortihm was especially written to avoid overflows of 32-bit registers. Because Patch 102781 is a more faithful implementation of P&T's algortihm, it is also able to completely dispense with the bizarre [*] incantation currently in place to perform bit-masking across many compilers and platforms. IMHO, the additional comments also do an improved job of explaining what is going on. Another reason Patch 102781 is better is that it is future-proof against a world where ints are 64 bits, and longs are even longer. The P&T algorithm makes the specific assumption of a 32 bit register. Patch 102781 properly masks seed values to 31 bits before the recurrence begins, so this assumption is enforced. If we assume that Tcl will never be built on a platform with ints smaller than 32 bits, then the promotion of the variable tmp to type long in Patch 102781 is unnecessary. It was added simply because I was already modifying the code, and it seemed a good opportunity to better conform to ANSI standards. It is not essential to proper working of Patch 102781. But it doesn't hurt, and it might even help someone someday trying to port Tcl to an exotic platform, so I'd prefer to keep it in. [*] Not my description, that's straight from a comment already in the code. [**] Try this test program to see if your compiler is broken. It should print the same value twice: #include <stdio.h> #define Q 127773 int main(int argc, char **argv) { long l = 1L << 37; int tmp = l / Q; int i = tmp*Q; fprintf(stderr, "d = %ld\n", l - tmp*Q); fprintf(stderr, "l-i = %ld\n", l - i); } hobbs added on 2000-12-12 11:04:12: Are there really any systems that Tcl still runs on that have 16 bit ints? I believe much of the Tcl code wouldn't work correctly if we couldn't make that assumption. Also, I'm wondering whether we aren't interpreting this wrong. The code works fine on Solaris/64, as well as Tru64. I didn't try the patch though, and if it is truly correct for all platforms, so much the better. dgp added on 2000-12-11 23:27:57: Please review. |
Attachments:
- None [download] added by dgp on 2000-12-11 23:26:32. [details]