Tcl Source Code

View Ticket
Login
Ticket UUID: 402781
Title: Fix for Bug 121072
Type: Patch Version: None
Submitter: dgp Created on: 2000-12-11 16:26:32
Subsystem: 69. Other Assigned To: dgp
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2016-08-22 12:35:31
Resolution: None Closed By: dkf
    Closed on: 2016-08-22 12:35:31
User Comments: dgp added on 2000-12-15 05:24:58:
Integrated to CVS HEAD.

john_ousterhout added on 2000-12-15 01:25:16:
In response to Jeff's comment, as far as I know Tcl has *never* run (well) on systems with 16-bit ints.  In designing Tcl I always assumed that ints would have at least 32 bits; I bet many things would break if you tried to run with smaller ints.

dgp added on 2000-12-14 04:13:38:
Here's a script to illustrate the incompatibility:

# FILE: randtest.tcl
puts [expr srand([lindex $argv 0])]
puts [expr rand()]
puts [expr rand()]

The shell tclsh8.3 is Tcl 8.3.2; the shell tclsh8.4
is the CVS HEAD revision of Tcl on Dec. 12, 2000, plus
Patch 102781.

On 32-bit SunOS 5.7:
$ tclsh8.3 randtest.tcl 10000
0.0782636925943
0.377881431662
0.0532219503323

$ tclsh8.3 randtest.tcl 137438953472
integer value too large to represent
    while executing
"expr srand([lindex $argv 0])"
    invoked from within
"puts [expr srand([lindex $argv 0])]"
    (file "randtest.tcl" line 1)

$ tclsh8.4 randtest.tcl 10000
0.0782636925943
0.377881431662
0.0532219503323

$ tclsh8.4 randtest.tcl 137438953472
integer value too large to represent
    while executing
"expr srand([lindex $argv 0])"
    invoked from within
"puts [expr srand([lindex $argv 0])]"
    (file "randtest.tcl" line 1)

So that's compatible.  Next, on Alpha w/ Gnu C compiler:

$ tclsh8.3 randtest.tcl 10000
0.0782636925943
0.377881431662
0.0532219503323

$ tclsh8.3 randtest.tcl 137438953472
1.00050088856
0.418434093901
0.621816196768

$ tclsh8.4 randtest.tcl 10000
0.0782636925943
0.377881431662
0.0532219503323

$ tclsh8.4 randtest.tcl 137438953472
0.242578298898
0.0134695745136
0.383138850044

Incompatible, but it must be incompatible to fix
the bug.

Finally on Alpha with the Compaq C Compiler:
$ tclsh8.3 randtest.tcl 10000
0.0782636925943
0.377881431662
0.0532219503323

$ tclsh8.3 randtest.tcl 137438953472
0.000500887632603
0.418418441163
0.358740620482

$ tclsh8.4 randtest.tcl 10000
0.0782636925943
0.377881431662
0.0532219503323

$ tclsh8.4 randtest.tcl 137438953472
0.242578298898
0.0134695745136
0.383138850044

Incompatible, even though there was no apparent bug
in Tcl 8.3.

The incompatibility bites only if someone has a script
like this one where they input seeds >= 2^31, and
expect the same seed to generate the same sequence
in Tcl 8.4 that it did in Tcl 8.3 (maybe they're doing
a long series of simulations, and in the middle their
sysadmin upgrades the system's installed Tcl from Tcl
8.3 to Tcl 8.4).

I think this should be noted in ChangeLog and changes,
but isn't enough of a problem to prevent this fix going
into Tcl 8.4, or even Tcl 8.3.3.  I'm willing to
patch the docs, too, if you think it is important enough.

hobbs added on 2000-12-14 02:29:31:
Regarding the minor incompatability...  I believe you're saying that any seed fed into 8.4 will still generate the same random number sequence, be it on a 64bit or 32bit int system.  However, the same seed fed into 8.4 on a 64bit system will not generate the same sequence for seeds > 2^31 as in 8.3.  I think this is acceptable.  We might patch the docs to mention it as an anecdote, just in case.

hobbs added on 2000-12-14 02:26:37:
Regarding the minor incompatability...  I believe you're saying that any seed fed into 8.4 will still generate the same random number sequence, be it on a 64bit or 32bit int system.  However, the same seed fed into 8.4 on a 64bit system will not generate the same sequence for seeds > 2^31 as in 8.3.  I think this is acceptable.  We might patch the docs to mention it as an anecdote, just in case.

dgp added on 2000-12-13 08:53:17:
Even without a convincing case that some compilers
are broken, it is evident that they vary, so we need
an implementation that works with them all.

A 64-bit world would be a great reason, some day, to
replace Tcl's "minimally acceptable" [*] rand() with a
better one.  For now, I'm satisfied with fixing this one.
Besides, anyone doing serious statistics is, I hope, wise
enough to know that Tcl's rand() is not up to the task.
They should be producing and using extensions specialized
for that purpose.

There is a very minor (IMHO) incompatibility introduced
by this patch.  [expr srand($seed)] is used to initialize
the random number generator so by using the same seed
again and again, the programmer can generate the same
pseudo-random sequence again and again.  After this patch
is applied, on those 64-bit platforms where Bug 121072 did
not exhibit, and for values of $seed >= 2^31, the sequence
generated after [expr srand($seed)] in Tcl 8.4 will not be
the same sequence as was generated by the same $seed in
Tcl 8.3.  For other platforms, or values of $seed less than
2^31 on all platforms, the $seed -> sequence map should be
unchanged.

Without objection, I'll integrate on Dec. 14, 2000.

[*] Press & Teukolsky

hobbs added on 2000-12-13 07:49:15:
Given Don's excellent analysis, I favor going with the larger patch.  However, in reading K&R 2nd edition A6.5 "Arithmetic Conversions", I'm not seeing that you can decidedly say whether the gcc or Compaq camps are doing the "right thing".  Note that gcc on Solaris works, whereas gcc on Alpha didn't.

In any case, a more robust solution is always welcome.  The only concern I have is that, in a world where all ints are 64 bits, do we want to limit the randseed to 31?

dgp added on 2000-12-13 06:25:23:
Following up on Jeff's claim that this bug is not
present on all 64-bit systems after all, I built
Tcl on an Alpha using both the Gnu C compiler, and
the Compaq C Compiler

http://www5.compaq.com/products/software/compilers/candcxx.html

Sure enough, the Compaq C Compiler produces a Tcl
library that does not exhibit Bug 121072.  The Tcl
library compiled by the GNU C Compiler does suffer
from Bug 121072.

The difference comes down to the compilers' interpretation
of line 4099 of revision 1.17 of generic/tclExecute.c.
That line includes the expression

iPtr->randSeed - tmp*RAND_IQ

where iPtr->randSeed is a long, tmp is an int, and 
RAND_IQ is the literal 127773.  The Compaq compiler
promotes all three quantities to longs before computing
the expression.  The GNU compiler, however, sees the
product of tmp (an int) and 127773 (an int constant),
and computes the multiply using ints, which can overflow
(and does overflow in the Bug 121072 example).  This
difference causes the two compilers to produce code
that compute different results.

The strange thing is that by my reading of K&R's ANSI
C book, it is the GNU Compiler that is behaving correctly -- the product of two ints should be stored in an int -- and Tcl's code is depending on the broken behavior of the Compaq compiler (and others?).  A colleague reports that SGI's
C compiler with the -64 option also works correctly, so a Tcl library built on SGI with the -64 option also exhibits Bug 121072.

Isn't it fun working around compiler bugs!  :( [**]

The tiniest fix for Bug 121072 is to declare tmp to be
of type long, rather than type int, and leave everything
else alone.  That will fix the problem.  So, if you prefer
a minimum-code-changes approach, we could just do that.

However, Patch 102781 provides a more substantial rewrite.
It avoids the overflow problem entirely by keeping Tcl's
implementation within the constraints of the algorithm
described in the Press & Teukolsky reference.  After all,
the P&T algortihm was especially written to avoid
overflows of 32-bit registers.  Because Patch 102781 is a more faithful implementation of P&T's algortihm, it is also able to completely dispense with the bizarre [*] incantation currently in place to perform bit-masking across many compilers and platforms.  IMHO, the  additional comments also do an improved job of explaining what is going on.

Another reason Patch 102781 is better is that it is
future-proof against a world where ints are 64 bits, and longs are even longer.  The P&T algorithm makes the specific assumption of a 32 bit register.  Patch 102781 properly masks seed values to 31 bits before the recurrence begins, so this assumption is enforced.

If we assume that Tcl will never be built on a platform
with ints smaller than 32 bits, then the promotion of the variable tmp to type long in Patch 102781 is unnecessary.  It was added simply because I was already modifying the code, and it seemed a good opportunity to better conform to ANSI standards.  It is not essential to proper working of
Patch 102781.  But it doesn't hurt, and it might even help someone someday trying to port Tcl to an exotic platform, so I'd prefer to keep it in.

[*] Not my description, that's straight from a comment
already in the code.

[**] Try this test program to see if your compiler is
broken.  It should print the same value twice:

#include <stdio.h>

#define Q 127773

int main(int argc, char **argv)
{
long l = 1L << 37;
int tmp = l / Q;
int i = tmp*Q;

fprintf(stderr, "d   = %ld\n", l - tmp*Q);
fprintf(stderr, "l-i = %ld\n", l - i);
}

hobbs added on 2000-12-12 11:04:12:
Are there really any systems that Tcl still runs on that have 16 bit ints?  I believe much of the Tcl code wouldn't work correctly if we couldn't make that assumption.

Also, I'm wondering whether we aren't interpreting this wrong.  The code works fine on Solaris/64, as well as Tru64.  I didn't try the patch though, and if it is truly correct for all platforms, so much the better.

dgp added on 2000-12-11 23:27:57:
Please review.

Attachments: