Tcl Source Code

View Ticket
Login
Ticket UUID: b87ad7e9146832d505f9a430d779c5313c440256
Title: NativeGetTime time-drifts bug (fixed)
Type: Bug Version: all
Submitter: sebres Created on: 2016-12-29 01:03:16
Subsystem: 06. Time Measurement Assigned To: nobody
Priority: 5 Medium Severity: Critical
Status: Closed Last Modified: 2017-01-09 17:55:11
Resolution: Fixed Closed By: sebres
    Closed on: 2017-01-09 17:55:11
Description:
By testing of performance of my newer tcl-clock module I've realized that we have a bug in time routines (at least grave on Windows platform).

If some thread too often calls Tcl_GetTime, that blurs current performance counters normally actualized in calibration thread in UpdateTimeEachSecond.

This entails that sometimes sporadically time-drifts resp. jump-esque time-shifts occurred, what for example produces very confusing results during time measurement.

An explanation example with "time" command:

% time {puts [time {clock scan "2451545 102030" -format "%J%H%M%S"} 566000]} 100

0.8814169611307421 microseconds per iteration
0.8766148409893993 microseconds per iteration
0.8825282685512368 microseconds per iteration
0.8852031802120142 microseconds per iteration
0.8810035335689046 microseconds per iteration
---------------------------------------------
2.7021978798586574 microseconds per iteration
---------------------------------------------
0.8808533568904594 microseconds per iteration
0.8875512367491166 microseconds per iteration
0.8777968197879858 microseconds per iteration
0.8813586572438162 microseconds per iteration
0.8832685512367491 microseconds per iteration
0.8851378091872791 microseconds per iteration

Or another example with my "timerate" command (with debug output of now.sec/now.usec from Tcl_GetTime, and start, planed stop in microseconds):

% time {puts [timerate {clock scan "2451545 102030" -format "%J%H%M%S"} 500]} 100

*********1482963877, 264920 = 1482963877264920 - 1482963877764920
*********1482963877, 764920 = 1482963877764920
0.882205 µs/#, 566762 #, 1133524 #/sec
*********1482963877, 765445 = 1482963877765445 - 1482963878265445
*********1482963878, 265445 = 1482963878265445
0.879648 µs/#, 568409 #, 1136818 #/sec
*********1482963878, 265821 = 1482963878265821 - 1482963878765821
*********1482963878, 765821 = 1482963878765821
0.880355 µs/#, 567953 #, 1135906 #/sec
*********1482963878, 766199 = 1482963878766199 - 1482963879266199
*********1482963879, 266199 = 1482963879266199
0.881784 µs/#, 567032 #, 1134064 #/sec
*********1482963879, 266795 = 1482963879266795 - 1482963879766795
*********1482963879, 766795 = 1482963879766795
0.877449 µs/#, 569834 #, 1139668 #/sec
*********1482963879, 767440 = 1482963879767440 - 1482963880267440
*********1482963880, 267440 = 1482963880267440
0.883411 µs/#, 565988 #, 1131976 #/sec
*********1482963880, 267811 = 1482963880267811 - 1482963880767811
*********1482963880, 767811 = 1482963880767811
0.881778 µs/#, 567036 #, 1134072 #/sec
-----------------------------------------------------------------
*********1482963880, 768160 = 1482963880768160 - 1482963881268160
*********1482963881, 913628 = 1482963881913628
8.606911 µs/#, 133087 #, 116185 #/sec
-----------------------------------------------------------------
*********1482963881, 914035 = 1482963881914035 - 1482963882414035
*********1482963882, 414035 = 1482963882414035
0.882879 µs/#, 566329 #, 1132658 #/sec
*********1482963882, 414404 = 1482963882414404 - 1482963882914404
*********1482963882, 914404 = 1482963882914404
0.877547 µs/#, 569770 #, 1139540 #/sec
*********1482963882, 914911 = 1482963882914911 - 1482963883414911
*********1482963883, 414911 = 1482963883414911
0.878137 µs/#, 569387 #, 1138774 #/sec
*********1482963883, 415353 = 1482963883415353 - 1482963883915353
*********1482963883, 915353 = 1482963883915353
0.880037 µs/#, 568158 #, 1136316 #/sec


More detailed example with "time" command and microseconds with difference between start/stop.
Here you see that instead 500ms in makes a time jump to the next second (1500ms) despite the real execution of 500ms.

% set p [clock microseconds]; time { puts "[time {clock scan "2451545 102030" -format "%J%H%M%S"} 566000]\t**** [set n [clock microseconds]] -- [expr {$n - $p}]"; set p $n } 100

0.8926342756183746 microseconds per iteration   **** 1482965362339022 -- 505488
0.8897190812720848 microseconds per iteration   **** 1482965362842886 -- 503864
0.8906837455830389 microseconds per iteration   **** 1482965363347259 -- 504373
0.8877826855123675 microseconds per iteration   **** 1482965363849937 -- 502678
0.8867455830388693 microseconds per iteration   **** 1482965364352100 -- 502163
0.8848586572438163 microseconds per iteration   **** 1482965364853183 -- 501083
0.8814593639575972 microseconds per iteration   **** 1482965365352326 -- 499143
0.8891395759717314 microseconds per iteration   **** 1482965365855784 -- 503458
0.8802950530035336 microseconds per iteration   **** 1482965366354316 -- 498532
--------------------------------------------------------------------------------
2.708512367491166 microseconds per iteration    **** 1482965367887622 -- 1533306
--------------------------------------------------------------------------------
0.8872332155477032 microseconds per iteration   **** 1482965368390069 -- 502447
0.8919593639575971 microseconds per iteration   **** 1482965368895198 -- 505129
0.8893409893992933 microseconds per iteration   **** 1482965369398836 -- 503638

After fixing of this issue, the result looks like this.
All the times are become longer, but no jumps resp. time-shifts any more:

% set p [clock microseconds]; time { puts "[time {clock scan "2451545 102030" -format "%J%H%M%S"} 566000]\t**** [set n [clock microseconds]] -- [expr {$n - $p}]"; set p $n } 100

0.9901731448763251 microseconds per iteration   **** 1482967567052169 -- 560698
0.9650424028268552 microseconds per iteration   **** 1482967567598682 -- 546513
0.9574081272084806 microseconds per iteration   **** 1482967568140856 -- 542174
0.9472137809187279 microseconds per iteration   **** 1482967568677202 -- 536346
0.9592455830388693 microseconds per iteration   **** 1482967569220385 -- 543183
0.9634293286219081 microseconds per iteration   **** 1482967569765893 -- 545508
0.9630742049469965 microseconds per iteration   **** 1482967570311220 -- 545327
0.9530530035335689 microseconds per iteration   **** 1482967570850904 -- 539684
0.9532349823321554 microseconds per iteration   **** 1482967571390720 -- 539816
0.9471431095406361 microseconds per iteration   **** 1482967571927080 -- 536360
0.9596484098939929 microseconds per iteration   **** 1482967572470496 -- 543416
0.9597349823321555 microseconds per iteration   **** 1482967573013965 -- 543469
0.9652067137809187 microseconds per iteration   **** 1482967573560460 -- 546495
0.9649399293286219 microseconds per iteration   **** 1482967574106875 -- 546415
0.956809187279152 microseconds per iteration    **** 1482967574648661 -- 541786
0.9588038869257951 microseconds per iteration   **** 1482967575191604 -- 542943
0.9484293286219081 microseconds per iteration   **** 1482967575728694 -- 537090
0.96363074204947 microseconds per iteration     **** 1482967576274333 -- 545639
0.9441678445229682 microseconds per iteration   **** 1482967576808984 -- 534651

--

Currently no fossil handy (new computer and fossil-repo (or my network?) is too slow by cloning, so permanent timeouts). I've committed it in my github-repo. See https://github.com/sebres/tcl/commit/975126bb18342949a317fe55fb68e89b0665f563

[win] bug fix in NativeGetTime: each call of it blurs current performance counters actualized in calibration thread in UpdateTimeEachSecond;
[unix] wrong cast fixed in TclpGetWideClicks: multiplication with 1000000 in long int may cause overflow
User Comments: sebres added on 2017-01-09 17:55:11:
Merged in all 3 major branches

jan.nijtmans added on 2017-01-09 12:56:22:
The only while(0)'s I see are in macro definitions, like:
#define INSIST(c, e) do { if (!(c)) ERR(e); } while (0)	/* error if c false */

The purpose of this is to make INIST() behave like a single command. I don't see anywhere in the Tcl or Tk code a instance where this construct is used for variable scoping purposes.

sebres added on 2017-01-09 11:35:07:
> What about compilers that don't optimize the "do {} while(0)" away?

Nope. There are none that do this...
And secondly, we should then rewrite whole source code in tcl-core. Just search for "while (0)", you'll see the source code is full with that.

jan.nijtmans added on 2017-01-09 09:24:15:
Sorry, I don't buy your 'optimization' argument: What about compilers that don't optimize the "do {} while(0)" away? Bare braces would indeed be
better, but unfortunately not all compilers support that. Don't worry,
variable scope analysis is common in compiler optimizations, I don't
beleave it actually makes any difference in generated code.

Branch "bug_b87ad7e914" is rebased to core-8-5-branch now, so go ahead.

If you want to start a discussion on the Tcl Core mailing list about this, you are welcome, but I don't give you much chance ... I don't think it's really worth it.

sebres added on 2017-01-06 17:01:16:
> My only remark, on the "do {...} while(0)": Please don't do that, just leave the variable "struct _timeb t" at the top of the function, where it originally was.

The goal is to declare <b>normally unused</b> variable "struct _timeb t" as close to local use as possible...

This new scope allows:
    - better to read if variable declared pair rows above where it'll be used only (and not at begin of possibly large procedure).
    - more chances the compiler optimized such code as fast as possible (e. g. by -O2 or -O3, -Ofast it will use stack resp. registers already unused elsewhere above).

It is most important (by nearby real-time purposes) and sometimes often used function, so I'll appreciate to do that, if no another objections.

But if you want, I can replace the do...while(0) scope with the bare braces scope... (I mean I saw some obsolete compilers, that don't like bare braces scopes).

sebres added on 2017-01-06 16:45:49:
All examples above expected my faster clock engine (each iteration should call clock command and runs ca. 0.9µs, so the big iteration running ca. 500ms)...

Therefore an example with the stock tcl-clock, that produces it in my case also.

If you've faster/slower mashine you should correct speedFactor variable to reach above mentioned values (0.9µs / 0.5 sec).

Otherwise it may be hard (or you should wait longer) to reproduce this bug.

```
% set speedFactor 62; set p [clock microseconds]; time { puts "[time {string repeat "t" $speedFactor; clock microseconds} 566000]\t**** [set n [clock microseconds]] -- [expr {$n - $p}]"; set p $n } 100

0.8695459363957597 microseconds per iteration   **** 1483720212343459 -- 492475
0.8817173144876325 microseconds per iteration   **** 1483720212842870 -- 499411
0.8644752650176678 microseconds per iteration   **** 1483720213332515 -- 489645
0.8931360424028268 microseconds per iteration   **** 1483720213838395 -- 505880
0.860083038869258 microseconds per iteration    **** 1483720214325544 -- 487149
0.8739275618374558 microseconds per iteration   **** 1483720214820499 -- 494955
0.872791519434629 microseconds per iteration    **** 1483720215314796 -- 494297
0.8781908127208481 microseconds per iteration   **** 1483720215812055 -- 497259
0.8626166077738516 microseconds per iteration   **** 1483720216300627 -- 488572
0.8843833922261484 microseconds per iteration   **** 1483720216801552 -- 500925
0.8563657243816254 microseconds per iteration   **** 1483720217286400 -- 484848
0.8794487632508834 microseconds per iteration   **** 1483720217784535 -- 498135
0.866613074204947 microseconds per iteration    **** 1483720218275245 -- 490710
--------------------------------------------------------------------------------
2.663030035335689 microseconds per iteration    **** 1483720219782771 -- 1507526
--------------------------------------------------------------------------------
0.8599045936395759 microseconds per iteration   **** 1483720220269726 -- 486955
0.8882826855123674 microseconds per iteration   **** 1483720220772700 -- 502974
0.8582491166077738 microseconds per iteration   **** 1483720221258731 -- 486031
0.888178445229682 microseconds per iteration    **** 1483720221761694 -- 502963
0.8821819787985866 microseconds per iteration   **** 1483720222261384 -- 499690
0.8775848056537102 microseconds per iteration   **** 1483720222758350 -- 496966
0.8601201413427562 microseconds per iteration   **** 1483720223245408 -- 487058
0.8903939929328621 microseconds per iteration   **** 1483720223749622 -- 504214
```

jan.nijtmans added on 2017-01-06 16:16:38:
No objection from me. Change looks good, for core-8-5-branch and up.

My only remark, on the "do {...} while(0)": Please don't do that, just leave the variable "struct _timeb t" at the top of the function, where it originally was.

That's the only remark I can think of.

Thanks!

sebres added on 2017-01-06 15:31:10:
Oh, I see I've already rebased it from git (too many projects at once :) 

So then merge?

sebres added on 2017-01-06 15:27:16:
If no objections, I'll rebase it from git to fossil and merge it across core-8-5-branch, core-8-6-branch and trunk...

sebres added on 2017-01-02 14:37:43:

fixed in [b6fc234ef3dac6df0e895e5a7b30b3785eb7f480] (see branch bug_b87ad7e914)