Tcl Source Code

View Ticket
Login
Ticket UUID: b7b707a310ea42e9f1b29954ee8ca13ae91ccabe
Title: [win32] NRT-only - NativeGetTime backwards time-drifts bug (fixed)
Type: Bug Version: all (fixed in my tcl-8.5-branch)
Submitter: sebres Created on: 2017-03-31 19:00:03
Subsystem: 03. Timer Events Assigned To: nobody
Priority: 7 High Severity: Severe
Status: Open Last Modified: 2019-12-16 14:08:56
Resolution: Fixed Closed By: nobody
    Closed on: 2019-03-05 22:01:39
Description:
This can be seen as the amendment for [b87ad7e9146832d505f9a430d779c5313c440256]

The short description:

By NRT (near real-time) development under windows, I get several issues with drifting of the [clock microseconds] etc. back, that looks like:

  clock microseconds -> 1490985783769193
  clock microseconds -> 1490985783769191

Additionally I've own clock extension with "nanoseconds", where this is more critical and very annoying:
  clock nanoseconds -> 1490985425627129800
  clock nanoseconds -> 1490985425627128400

This goes to the calibration cycle, where the new combination of fileTimeLastCall, curCounterFreq and perfCounterLastCall breaks the forward running of the native virtual clock.

I've fixed it for tcl8.5 branch (merge-able to 8.6 and trunk), solution coming soon...

Just as reminder for me (excerpt of log-message of the fix):
```
SHA-1: a746f0501229a4a76e3c9a3825c55dbe9c8c3cdf

* optimized calibration cycle (makes Tcl for windows "RTS" resp. NRT-capable):
  - the clock ticks never backwards (avoid it by negative drifts using comparison of times before and after calibration);
  - more precise, smooth/soft drifting (avoids too large drifts, already after 10 iterations the drift gets fewer as 0.1 microseconds);
  - because of more accurate drifting (aspire to the smallest difference), we can prolong calibration interval (up to 10 seconds by small tdiff-value);
```
User Comments: sebres added on 2019-12-16 14:08:56:

Newest findings here - test cmdMZ-6.6 could sporadically fail with:

---- Result was:
{1 0 1 0 1 0 1 1 1} {998.039 µs/# 51 # 1002.0 #/sec 50.900 net-ms} {999.255 µs/# 51 # 1000.7 #/sec 50.962 net-ms}
---- Result should have been (glob matching):
{1 1 1 1 1 1 1 1 1} *
In normal case the first measurement value (m1) is in about {11.8732 µs/# 4212 # 84223.2 #/sec 50.010 net-ms}.
So a busy wait 0.01ms (_nrt_sleep 0.01) is abrupt 100 times slower (or even more due to repetition).

Probably I should rebase parts of my event-perf-branch fixing that.


sebres added on 2019-03-05 22:01:39:

Just as PoC (or for the case someone needs it for tests resp. to check for some other platforms)...

Here is an example with 30 seconds run-time:

proc test {tm} {
  set l [clock microseconds];
  timerate {
    set i [clock microseconds];
    if {$i < $l} {error "$i < $l"};
    set l $i
  } $tm
}; 
test 30000

Similar tests have been succeeded (no backward-drifts after run-time of ca. half an hour) on physical host and virtual machines of me (single- and multi-threaded).


sebres added on 2019-03-05 20:23:02:

fixed in a minimalist way in [f6637d3dd8] (merged now), the extensions are provided in event-perf-branches (thus close now).


sebres added on 2017-05-23 22:01:26:

[f6637d3dd85af2a35101e2461c5769ebd1b27cbe] closes this


sebres added on 2017-03-31 20:53:49:
As example below small excerpt with DEBUG-output of the calibration thread:

```
*****[30F4]***** 1490992535186529 ** tdiff **-1965, 14909925351865281, 14909925351867246
*****[30F4]***** before 14909925351870010, 14909925351870010
*****[30F4]***** after  14909925351870009, 14909925351870010, diff: -1 / 0
*****[30F4]***** 1490992535187944, [10] ** cfreq == 3117204, nfreq == 3117441, perfc == 2031595984296
*****[30F4]***** 1490992545326460 ** tdiff **-2006, 14909925453264590, 14909925453266596
*****[30F4]***** before 14909925469160412, 14909925469160412
*****[30F4]***** after  14909925469160386, 14909925469160412, diff: -26 / 0
*****[30F4]***** 1490992546917626, [10] ** cfreq == 3117209, nfreq == 3117441, perfc == 2031627592529
*****[30F4]***** 1490992556043572 ** tdiff **-2206, 14909925560435703, 14909925560437909
*****[30F4]***** before 14909925560441189, 14909925560441189
*****[30F4]***** after  14909925560441188, 14909925560441189, diff: -1 / 0
*****[30F4]***** 1490992556045267, [10] ** cfreq == 3117203, nfreq == 3117441, perfc == 2031660999997
*****[30F4]***** 1490992566183518 ** tdiff **-2089, 14909925661835170, 14909925661837259
*****[30F4]***** 1490992566184086, [10] ** cfreq == 3117205, nfreq == 3117441, perfc == 2031692608269
*****[30F4]***** 1490992576323403 ** tdiff **-2587, 14909925763234022, 14909925763236609
*****[30F4]***** before 14909925763238690, 14909925763238690
*****[30F4]***** after  14909925763238689, 14909925763238690, diff: -1 / 0
*****[30F4]***** 1490992576324574, [10] ** cfreq == 3117182, nfreq == 3117441, perfc == 2031724216370
*****[30F4]***** 1490992586463519 ** tdiff **-786, 14909925864635173, 14909925864635959
*****[30F4]***** 1490992586464045, [10] ** cfreq == 3117256, nfreq == 3117441, perfc == 2031755824954
```

Legend:
  tdiff  : current deviation from virtual time (vt0) to the file-time (in 100-ns);
  before : nt0, nt0 -  virtual time before calibration;
  after  : nt1, nt1c - virtual time after calibration and correction 
           (nt1c = vt0 + (nt0-nt1));
  diff   : -26 / 0 - difference without and with correction;
  [10]   : current calibration interval (10 seconds)
  cfreq  : calculated frequency;
  nfreq  : nominal frequency;
  perfc  : last performance counter;