TIP 527: New measurement facilities in TCL: New command timerate.

Login
Author:         Sergey G. Brester <sebres@users.sourceforge.net>
State:          Draft
Type:           Project
Vote:           Pending
Created:        26-Nov-2018
Tcl-Version:    8.5
Tcl-Branch:     sebres-8-6-timerate
Discussions-To: news:comp.lang.tcl
Post-History: 

Abstract

This TIP proposes a new command timerate as well as a measurement toolchain for TCL.

In order for it to be possibile to compare results among older Tcl versions, the version 8.5 is suggested as a target version for this TIP. (To forestall the objection that this TIP extends old versions, the command timerate shall be placed into ::tcl::unsupported namespace in versions prior to 8.7.)

Additionally, this TIP proposes-optionally-a small framework to test performance. This makes possible to create test scripts for possible performance regressions and allows a diff-based comparison of the results to find performance degradation between revisions.

Rationale

Although the existing command time can be used for performance measurement, it has several disadvantages:

  1. The execution is limited by fixed repetition count, so the real execution time is undefined (if the evaluation time of single iteration is unknown or may vary in time in different versions). The time neede by a test case could grow to be very great.
  2. time does not have a calibration ability.
  3. time uses the default script invocation function (Tcl_EvalObjEx), therefore the speed of the script is different from that of compiled byte-code (e.g. in a compiled procedure),
  4. In addition measurement is very imprecise on extremely fast scripts, because of certain unadvoidable overhead costs (the execution of time and the interpreter overhead up to the TclNRRunCallbacks call that invokes TEBCResume for a compiled script).
  5. time introduces additional external costs on the script being measured, such as washout of the CPU-cache and branch misprediction.

I. Proposed command: timerate

timerate - Time-related execution resp. performance measurement of a script

Synopsis

timerate script ?time? ?count?

timerate ?-direct? ?-overhead double? script ?time ?count??

timerate ?-calibrate? ?-direct? script ?time ?count??

Description

The first and second form will evaluate script until the interval time given in milliseconds elapses, or for 1000 milliseconds (1 second) if time is not specified. If parameter count is additionally specified (not yet introduced in mentioned tcl-branch, coming soon) it means max count times to evaluate the script, so the evalution will stop after this count of iterations reached.

It will then return a canonical tcl-list of the form (similar to command time but more expressive):

.095977 µs/# 52095836 # 10419167 #/sec 5000.000 nett-ms

which indicates:

   - the average amount of time required per iteration, in microseconds (`lindex $result 0`)

   - the count how many times it was executed (`lindex $result 2`)

   - the estimated rate per second (`lindex $result 4`)

   - the estimated real execution time without measurement overhead (`lindex $result 6`)

Time is measured in elapsed time using the finest timer resolution as possible, not CPU time. This command may be used to provide information as to how well the script or a tcl-command is performing and can help determine bottlenecks and fine-tune application performance.

As opposed to the time commmand, which runs the tested script for a fixed number of iterations, the timerate command runs it for a fixed time. Additionally, the compiled variant of the script will be used during the entire measurement, as if the script were part of a compiled procedure, if the -direct option is not specified. The fixed time period and possibility of compilation allow for more precise results and prevent very long execution times by slow scripts, making it practical for measuring scripts with highly uncertain execution times.

To measure very fast scripts as precisely as posible the calibration process may be required.

Options

-calibrate

The -calibrate option is used to calibrate timerate, calculating the estimated overhead of the given script as the default overhead for future invocations of the timerate command. If the time parameter is not specified, the -calibrate procedure runs for up to 10 seconds.

-overhead double

The -overhead parameter supplies an estimate (in microseconds) of the measurement overhead of each iteration of the tested script. This quantity will be subtracted from the measured time prior to reporting results.

-direct

The -direct option causes direct execution of the supplied script, without compilation, in a manner similar to the time command. It can be used to measure the cost of Tcl_EvalObjEx, of the invocation of canonical lists, and of the uncompiled versions of bytecoded commands.

Examples

Estimate how fast it takes for a simple Tcl for loop (including operations on variable i) to count to ten:

# calibrate:
timerate -calibrate {}
# measure:
timerate { for {set i 0} {$i<10} {incr i} {} } 5000

Estimate how fast it takes for a simple Tcl for loop, ignoring the overhead for to perform ten iterations, ignoring the overhead of the management of the variable that controls the loop:

# calibrate for overhead of variable operations:
set i 0; timerate -calibrate {expr {$i<10}; incr i} 1000 
# measure:
timerate { for {set i 0} {$i<10} {incr i} {} } 5000

Estimate the speed of calculating the hour of the day using clock format only, ignoring overhead of the portion of the script that prepares the time for it to calculate:

# calibrate:
timerate -calibrate {}
# estimate overhead:
set tm 0
set ovh [lindex [timerate { incr tm [expr {24*60*60}] }] 0]
# measure using esimated overhead:
set tm 0
timerate -overhead $ovh {
  clock format $tm -format %H
  incr tm [expr {24*60*60}]; # overhead for this is ignored
} 5000

Impact

As the current implementation in sebres-8-6-timerate (or, for Tcl 8.5, sebres-8-5-timerate) shows, no public API's are affected by introducing this, A few new functions are provided for Tcl's internal API (see tclInt.h).

Don Porter (dgp) has made a review of the version of this change that targets Tcl 8.7, which can be found in branch dgp-sebres-timerate-review. The branch already contains the performance testing framework as well as a new script, tests-perf/clock.perf.tcl that instruments the clock command in preparation for integrating changes that improve its performance..


II. Proposed performance-testing framework: ::tclTestPerf

The small test suite in the ::tclTestPerf namespace allows for batch-based measurement and diff-based results to compare performamnce among revisions and detect performance regressions.

Synopsis

::tclTestPerf::_test_run time scripts

The _test_run command executes a batch of scripts provided in the syntax of a Tcl list, with the exception that rows may be commented using Tcl's # syntax. It produces an output, that can be used for automated comparison of perfomance measurements. When uncommented elements of the script are executed, an output is produced that is similar to the transcript of a tclsh session. The output for each element consists of: * a line consisting of the string, '% ', followed by the element being evaluated. * a line containing the result returned by evaluating the given element. * a line containing the result of applying timerate to the given element.

The special tokens, setup and cleanup, can be used to specify actions needed to initialize and finalize the test sequence before and after conducting the measurements; for example:

::tclTestPerf::_test_run 500 {
  setup {set i 0}
  {clock format [incr i] -format "%Y-%m-%dT%H:%M:%S" \
       -locale en -timezone :CET}
  cleanup {unset i}
  ...
}

It is recommended that the programmer divide multiple scripts into several blocks, because each block produces a separate summary output at end of execution, which contains the total, average, min and max time of execution of the script that the block contains, making it easier to identify which specific element has suffered from a performance regression.

Usage as test-script

## common test performance framework:
if {![namespace exists ::tclTestPerf]} {
  source [file join [file dirname [info script]] test-performance.tcl]
}

## test namespace:
namespace eval ::tclTestPerf-Example {

namespace path {::tclTestPerf}

## set testing defaults:
set ::env(TCL_TZ) :CET

## warm-up test-related features (load clock.tcl, system zones, locales, etc.):
clock scan "" -gmt 1
clock scan ""
clock scan "" -timezone :CET
clock scan "" -format "" -locale en
clock scan "" -format "" -locale de

proc test-format {{reptime 1000}} {
  _test_run $reptime {
    # Format : short, week only (in gmt)
    {clock format 1482525936 -format "%u" -gmt 1}
    # Format : short, week only (system zone)
    {clock format 1482525936 -format "%u"}
    # Format : short, week only (CEST)
    {clock format 1482525936 -format "%u" -timezone :CET}
  }
}

proc test-scan {{reptime 1000}} {
  _test_run $reptime {
    # Scan : date (in gmt)
    {clock scan "25.11.2015" -format "%d.%m.%Y" -base 0 -gmt 1}
    # Scan : date (system time zone, with base)
    {clock scan "25.11.2015" -format "%d.%m.%Y" -base 0}
    # Scan : date (system time zone, without base)
    {clock scan "25.11.2015" -format "%d.%m.%Y"}
  }
}

proc test {{reptime 1000}} {
  puts ""
  test-format $reptime
  test-scan $reptime
  puts \n**OK**
}

}; #/::tclTestPerf-Example

if {[info exists ::argv0] && [file tail $::argv0] eq [file tail [info script]]} {
  array set in {-time 500}
  array set in $::argv
  ::tclTestPerf-Example::test $in(-time)
}

Result of execution:

Calibration ... done: 0.0438427054988057 µs/#-overhead

% # Format : short, week only (in gmt)
% clock format 1482525936 -format "%u" -gmt 1
5
0.217501 µs/# 1913191 # 4597679 #/sec 416.121 nett-ms

% # Format : short, week only (system zone)
% clock format 1482525936 -format "%u"
5
0.249312 µs/# 1705587 # 4011041 #/sec 425.223 nett-ms

% # Format : short, week only (CEST)
% clock format 1482525936 -format "%u" -timezone :CET
5
0.201958 µs/# 2034167 # 4951516 #/sec 410.817 nett-ms

********************************************************************************
Total 3 cases in 1.52 sec. (1.25 nett-sec.):
0.668771 µs/# 5652945 # 4514551.244 #/sec 1252.161 nett-ms
Average:
0.222924 µs/# 1884315 # 4514551 #/sec 417.387 nett-ms
Min:
0.201958 µs/# 2034167 # 4951516 #/sec 410.817 nett-ms
Max:
0.249312 µs/# 1705587 # 4011041 #/sec 425.223 nett-ms
********************************************************************************

% # Scan : date (in gmt)
% clock scan "25.11.2015" -format "%d.%m.%Y" -base 0 -gmt 1
1448409600
0.359373 µs/# 1240031 # 2782622 #/sec 445.634 nett-ms

% # Scan : date (system time zone, with base)
% clock scan "25.11.2015" -format "%d.%m.%Y" -base 0
1448406000
0.406066 µs/# 1111337 # 2462654 #/sec 451.276 nett-ms

% # Scan : date (system time zone, without base)
% clock scan "25.11.2015" -format "%d.%m.%Y"
1448406000
0.448517 µs/# 1015517 # 2229568 #/sec 455.477 nett-ms

********************************************************************************
Total 3 cases in 1.51 sec. (1.35 nett-sec.):
1.213956 µs/# 3366885 # 2489586.930 #/sec 1352.387 nett-ms
Average:
0.404652 µs/# 1122295 # 2489585 #/sec 450.796 nett-ms
Min:
0.359373 µs/# 1240031 # 2782622 #/sec 445.634 nett-ms
Max:
0.448517 µs/# 1015517 # 2229568 #/sec 455.477 nett-ms
********************************************************************************

Examples

For the usage examples of this test-framework, see clock-speedup-branch - tests-perf/clock.perf.tcl or event-perf-branch - tests-perf/test-performance.tcl.

For the result example (produced with clock.perf.tcl) see diff in sebres/tcl#2.