Tcl Source Code

View Ticket
Login
Ticket UUID: 1353858
Title: windows: timing granularity is poor on many systems
Type: Patch Version: None
Submitter: matt-newman Created on: 2005-11-11 11:32:42
Subsystem: 06. Time Measurement Assigned To: kennykb
Priority: 5 Medium Severity:
Status: Open Last Modified: 2005-11-11 23:36:10
Resolution: None Closed By:
    Closed on:
Description:
Since the switch to using Perf counters timing below 10ms has been 
seriously impacted - many of the machines we us fair the tests, 
causing a fallback to the old method which was always 10ms or 
worse. But this problem has been compounded for explicit sleeps in 
Tcl_Sleep, since before that would work correctly, but now it loops 
until the time is right, which if you are limited to 10ms granularity 
means a Tcl_Sleep(1) might take anywhere from 11 to 20 or so ms. 
to complete.

After researching this it is clear (to me at least) that the PerfCounter 
apporach, whilst seemingly attractive is quite flawed - esp. on MP 
machines.

The attached diffs change the approach to use the multi-media 
timers with are not subject to OEM vagraties and on all the modern 
systems I have access to (which is quite a few!) seem to work well.

We have been using this patch in production for over three months 
with only good results :-)

Matt
User Comments: matt-newman added on 2005-11-11 23:36:10:
Logged In: YES 
user_id=1333796

As long as I can get 1ms timing if the OS is capable, then I am happy.

However I would warn you that I did some tests on our DL380 MP boxes 
(HP/Compaq) and perf-counter values were different across the cpus.

Matt

kennykb added on 2005-11-11 23:21:04:
Logged In: YES 
user_id=99768

Let me make sure that we're reading from the same page here,
because I think this is going at cross purposes.  I *do*
understand that the tests in the _WIN64 block fail. They are
too conservative.

I quite agree that an unpredictable 1-10 ms delay in [after
1] is unacceptable in any case, and I'm looking into using
the multimedia timer to mitigate that - bringing it down to
1 ms resolution.  That can be done regardless of whether we
use the perf counter.  So yes, your immediate problem *will*
get fixed, in something close to the way you request.

One possibility is to go to a two-loop PLL - phaselock the
MM timer derived reference to the system clock (this may be
done for us, I'm checking up on that), and then phaselock
the perf-counter-derived reference to the MM timer.  That
gives the single-processor (or even multiple-core-per-chip)
the best of both worlds.

You have more experience with modern MT servers than I do -
but the limited work that I've done on Dell servers suggests
that the board-level integrators actually did better than
MS's documentation indicates.  My understanding is that the
multiple CPU's actually derive their clocks from the same
reference, and get their reset pulse at the same time, so
that even though the counters are separate registers, they
actually increment in lockstep.  That's why I described the
tests as "overconservative."  This was not always true -
Compaq got it spectacularly wrong in the 486 era - but I
have contacts that report good results with patching out the
test on modern systems.  That's why I'm trying to identify
how to make the tests more permissive.

And, well, the perf counter is just too useful on today's
typical desktop machines for me to give it up entirely.  I
still think we can get the best of both worlds - no
unpredictable 10 ms delays *and* the high-resolution counter
in places where it works.

matt-newman added on 2005-11-11 22:03:00:
Logged In: YES 
user_id=1333796

I don't think you quite follow - on *all* the MP hyperthreaded systems I 
have tested from Dell and HP the criteria fails and it falls back to 10ms 
granularity.

Also PerfCounters is plain flawed on MP boxes anyway due to the fact 
that unless your process is locked on on CPU the numbers returned by 
PerfCounters depend on which CPU your code is executing on at that 
momement!

If you want sub-1ms timing for profiling purposes that is entirely a 
different issue from the primary timing sub-system. 

When we profile us have an extension that works like the tcl [time] 
command, but does two things to make the results predicable - 1. it 
locks the process onto a cpu (non-HT) and uses perf-counters for the 
extra resolution - this yields excellent results for profiling...

Also another thing to consider, if you like I, ship commercial server 
processes that heavily use  [after] it is not acceptable to have the timing 
of a [after 1] be 10ms on some systems and not others - it undermines 
the entire design of the application.

So in considering this patch, I ask you to seperate your (valid) concerns 
about good profiling, from the more general issues of a highly time-
dependent event loop.

Matt

kennykb added on 2005-11-11 21:38:57:
Logged In: YES 
user_id=99768

Matt,

I'm not quite willing yet to give up on the 
performance counter - simply because it's 
the only timing reference with sub-millisecond 
precision that we have.  As I read your patch, 
it rolls back to a state where 
[clock clicks -microseconds] returns something 
precise only to the millisecond - and moreover, 
on most machines, accurate only to the video 
frame.  For profiling, that's a horrible price to pay.

I'm most definitely willing - indeed, eager - to
add the multimedia timer approach on systems
where the performance counter does not pass
the tests.  Doing that would at least give us a
consistent timing reference on those machines.

But I'm more concerned about the MP machines
that "fail" the tests. My understanding of the
situation is that machines on which the performance
counter is unreliable are actually rare, and the tests
are quite over-conservative-- excluding all MP
machines except for GenuineIntel hyperthread.
I suspect that a few extra tests for machines on which
the perf counter is "safe" may well cover your
production machines.  I've actually encountered
only a couple of machines on which the vendor got
it wrong, and they are rather antiques now.

Have you tried simply patching out the checks for
perf counter frequency (the block of code conditioned
on "#if !defined(_WIN64)")?  If so, what were the results?
My suspicion is that on modern machines, that may well
Just Work - and in that case, we'd win simply by making
the checks within that block more permissive.

matt-newman added on 2005-11-11 18:32:45:

File Added - 155863: tclWinTime.c.diff

Attachments: