Tcl Source Code

View Ticket
Login
Ticket UUID: ddc948cff9781daac7ad95a3077b3c1b4f07cf93
Title: Tcl-clock speed-up / flightaware Tcl-bounties#4
Type: RFE Version: tcl8.6, tcl8.7
Submitter: sebres Created on: 2017-01-10 23:40:41
Subsystem: 16. Commands A-H Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Pending Last Modified: 2018-05-29 18:21:07
Resolution: Fixed Closed By: nobody
    Closed on: 2017-05-29 21:19:17
Description:

This is an artificial PR (RFE), corresponds with FlightAware's Tcl-bounties program, issue
flightaware/Tcl-bounties#4 (previous PR on github).

Back-ported to 8.6 now, see branch "sebres-8-6-clock-speedup".
Corresponds to branch "sebres-trunk-clock-speedup" currently based on "sebres-trunk-timerate" (used for performance measurement).

Current performance increase:

  • clock format - 10-15 times faster;
  • clock scan -format - 30-60 times (some previously extremely slow scans up to 100 times faster);
  • clock scan (freescan) - 15-20 times;
  • clock add - 30-90 times;

My current result of performance test-cases attached bellow, (running threaded, warmed up, 4 x threads working, 4 core i5-2400 @ 3.10 GHz).

Here is an excerpt from diff of total-blocks (normal system-load):

 % # Format
  
 ********************************************************************************
 Total 32 cases in 16.00 sec.:
-371.620594 µs/# 1736699 # 109263.467 #/sec 15894.599 nett-ms
+32.574084 µs/# 19007011 # 1277997.901 #/sec 14872.490 nett-ms
 Average:
-11.613144 µs/# 54271 # 109262 #/sec 496.706 nett-ms
+1.017940 µs/# 593969 # 1277999 #/sec 464.765 nett-ms
 Min:
-5.707173 µs/# 86686 # 175218 #/sec 494.732 nett-ms
+0.574267 µs/# 789155 # 1741349 #/sec 453.186 nett-ms
 Max:
-47.6873 µs/# 10472 # 20970.0 #/sec 499.381 nett-ms
+4.671148 µs/# 105698 # 214080 #/sec 493.731 nett-ms
 ********************************************************************************
  
 % # Scan

 ********************************************************************************
 Total 35 cases in 17.50 sec.:
-1831.621900 µs/# 413807 # 23678.910 #/sec 17475.762 nett-ms
+31.400022 µs/# 19301079 # 1180129.840 #/sec 16355.047 nett-ms
 Average:
-52.332054 µs/# 11823 # 23679 #/sec 499.307 nett-ms
+0.897143 µs/# 551459 # 1180129 #/sec 467.287 nett-ms
 Min:
-22.5439 µs/# 22120 # 44358.0 #/sec 498.670 nett-ms
+0.658980 µs/# 696086 # 1517495 #/sec 458.707 nett-ms
 Max:
-125.380 µs/# 3986 # 7975.7 #/sec 499.765 nett-ms
+1.916346 µs/# 253079 # 521826 #/sec 484.987 nett-ms
 ********************************************************************************

 % # FreeScan
 
 ********************************************************************************
 Total 17 cases in 8.50 sec.:
-684.758500 µs/# 321446 # 37903.163 #/sec 8480.717 nett-ms
+32.940470 µs/# 5976102 # 733668.512 #/sec 8145.507 nett-ms
 Average:
-40.279912 µs/# 18908 # 37902 #/sec 498.866 nett-ms
+1.937675 µs/# 351535 # 733668 #/sec 479.147 nett-ms
 Min:
-16.4479 µs/# 30288 # 60798.2 #/sec 498.173 nett-ms
+0.791828 µs/# 587441 # 1262901 #/sec 465.152 nett-ms
 Max:
-99.6852 µs/# 5013 # 10031.6 #/sec 499.722 nett-ms
+8.778852 µs/# 56573 # 113910 #/sec 496.646 nett-ms
 ********************************************************************************

 % # Add 

 ********************************************************************************
 Total 14 cases in 7.00 sec.:
-546.773300 µs/# 241810 # 34603.012 #/sec 6988.120 nett-ms
+9.650766 µs/# 9665868 # 1504157.966 #/sec 6426.099 nett-ms
 Average:
-39.055236 µs/# 17272 # 34603 #/sec 499.151 nett-ms
+0.689340 µs/# 690419 # 1504158 #/sec 459.007 nett-ms
 Min:
-15.3289 µs/# 32490 # 65236.4 #/sec 498.035 nett-ms
+0.554461 µs/# 814551 # 1803552 #/sec 451.637 nett-ms
 Max:
-76.5505 µs/# 6527 # 13063.3 #/sec 499.645 nett-ms
+0.899414 µs/# 521492 # 1111835 #/sec 469.037 nett-ms
 ********************************************************************************

Additionally it is few CPU and memory hungry.
For example after executing of following script tcl consumes in delta 4MB fewer memory per execution (because of memory fragmentation in original trunk, that goes to tcl internal memory manager):

time {set v([incr i]) [clock format $i -format "%Y-%m-%dT%H:%M:%S %z" -gmt 1]} 100000

Because many rules are system-wide (across interpreter), it spares resources also, e. g. don't wash out the cpu-cache.

Known small incompatibilities:

  • confusing precedence of some clock-formats by clock scan -format, see http://core.tcl.tk/tcl/tktview?name=e7a722cd35;
  • because of bug-fix, the free scanning by relative date with ordinal month and relative date with ordinal month and relative weekday deliver another results now (previously was not test-covered):
 % # FreeScan : relative date with ordinal month (I said January)
 % clock scan "5 years 18 months 385 days next 1 January" -base 0 -gmt 1
-Fri Jul 21 02:00:00 CEST 1978
+Sat Jan 21 01:00:00 CET 1978
 % # FreeScan : relative date with ordinal month and relative weekday (I said Fri in January)
 % clock scan "5 years 18 months 385 days next January Fri" -base 0 -gmt 1
-Sat Jul 22 02:00:00 CEST 1978
+Fri Jan 27 01:00:00 CET 1978
  • I've introduced an optional tokens, for example zone (%z resp. %Z) is optional now (can be switched off);
  • several bugs in clock format, as well as in clock scan was fixed, thus some new test-cases created, that may fail in original branch (see 629251fbb3dcecfa..71bd2e9833782df9);
  • additionally scan/format token %Es introduced to parse or format local seconds (in opposition to %s for posix seconds);
  • value -now will be accepted as clock value for format or add functions, e. g. clock format -now -f %u;
  • own extensions made in ParseClockFormatFormat, ParseClockScanFormat or DateParseActions are no more effective (because the scanning and formatting is pure C-implementation now). I plan to provide a functionality for extension with own scanning/parsing rules.

Side effects:

  • because new engine does not used regexp (and all lexer rules work in utf-8) the scanning object will be not converted in string (unicode), so leaves its current internal representation untouched.

User Comments: sebres added on 2018-05-29 18:21:07:

I've rebased (back-ported) my current state of tclclockmod back to tcl.core in branch "sebres-8-6-clock-speedup-cr2".

In comparision to sebres-8-6-clock-speedup-cr1 (was at some point removed from core-8-6-branch):

  • several bugs fixed (+ minor optimizations);
  • Flightaware DST fix applied, see [40ff672959] (this bug is still present in core branches);
  • Recognize positive time zone offset like "31 Jan 14 23:59:59 +0100", closes [5019748c73];
  • validation rules introduced, closes [3475995] (option "-valid 1|0" with possibility of setting the default configuration, more in PR github/sebres/tclclockmod#10);


sebres added on 2017-05-29 21:19:17:
merged now in 8.6/trunk

sebres added on 2017-05-29 11:48:39:
So the date 1495810800 is past.

If no objections, I would like to merge the subject into 8.6/trunk on 1496088000 / Mon May 29 20:00:00 GMT 2017.

Regards,
sebres.

sebres added on 2017-05-15 19:07:22:
Branch "sebres-clock-speedup" (trunk version) moved to "sebres-trunk-clock-speedup", because of the troubles to access on fresh-cloned fossil repository.

Branch "sebres-8-6-clock-speedup" was not affected from this problem.

sebres added on 2017-05-11 12:57:40:
As regards the last commit [dea486a4618db6b9] (guaranties that the clock-stubs are loaded on demand only), following comparison shows why this is necessary:

% timerate {exec tclsh86t << "clock seconds; exit"}; #don't load stubs
32281.4 µs/# 31 # 30.978 #/sec 1000.722 nett-ms

% timerate {exec tclsh86t << "clock format -now; exit"}; #loading stubs
73673.4 µs/# 14 # 13.573 #/sec 1031.428 nett-ms

BTW this new possibility of the namespace-based auto-loading (auto_index_ns) was cherry-picked from my another feature-branch, where I use it already long enough. 
For example I rewrote (to auto_index_ns) all the tclIndex-files on very-very large project of me, so the auto-loading is significant faster within its environment (additionally I've across-threaded implementation of auto_index (tsv), if someone needs please contact me).

sebres added on 2017-05-10 12:43:24:
Back-ported to 8.6 now, see branch "sebres-8-6-clock-speedup".

BTW In thin version I've merged Kenny's compiled solution for clock ensemble with part of my feature-performance branch, that optimized basically all ensembles (comming soon), but here "clock" only (all sub-commands are compiled now).

WiP: documentation, code review (+ still one known bug in integration with msgcat-dictionary).

If I get it ready (I hope still today) I will create 2 back-merge points for trunk (re-merge back into branches "sebres-trunk-timerate" and "sebres-clock-speedup").

Because all sub-commands compiled now (no ensemble overhead anymore for "clock ..."), the performance increase compared to original clock grows now up to 50 - 100 times.