Tcl Source Code: View Ticket

Ticket UUID:	1353858
Title:	windows: timing granularity is poor on many systems
Type:	Patch	Version:	None
Submitter:	matt-newman	Created on:	2005-11-11 11:32:42
Subsystem:	06. Time Measurement	Assigned To:	kennykb
Priority:	5 Medium	Severity:
Status:	Open	Last Modified:	2005-11-11 23:36:10
Resolution:	None	Closed By:
		Closed on:
Description:	Since the switch to using Perf counters timing below 10ms has been seriously impacted - many of the machines we us fair the tests, causing a fallback to the old method which was always 10ms or worse. But this problem has been compounded for explicit sleeps in Tcl_Sleep, since before that would work correctly, but now it loops until the time is right, which if you are limited to 10ms granularity means a Tcl_Sleep(1) might take anywhere from 11 to 20 or so ms. to complete. After researching this it is clear (to me at least) that the PerfCounter apporach, whilst seemingly attractive is quite flawed - esp. on MP machines. The attached diffs change the approach to use the multi-media timers with are not subject to OEM vagraties and on all the modern systems I have access to (which is quite a few!) seem to work well. We have been using this patch in production for over three months with only good results :-) Matt
User Comments:	matt-newman added on 2005-11-11 23:36:10: Logged In: YES user_id=1333796 As long as I can get 1ms timing if the OS is capable, then I am happy. However I would warn you that I did some tests on our DL380 MP boxes (HP/Compaq) and perf-counter values were different across the cpus. Matt kennykb added on 2005-11-11 23:21:04: Logged In: YES user_id=99768 Let me make sure that we're reading from the same page here, because I think this is going at cross purposes. I do understand that the tests in the _WIN64 block fail. They are too conservative. I quite agree that an unpredictable 1-10 ms delay in [after 1] is unacceptable in any case, and I'm looking into using the multimedia timer to mitigate that - bringing it down to 1 ms resolution. That can be done regardless of whether we use the perf counter. So yes, your immediate problem will get fixed, in something close to the way you request. One possibility is to go to a two-loop PLL - phaselock the MM timer derived reference to the system clock (this may be done for us, I'm checking up on that), and then phaselock the perf-counter-derived reference to the MM timer. That gives the single-processor (or even multiple-core-per-chip) the best of both worlds. You have more experience with modern MT servers than I do - but the limited work that I've done on Dell servers suggests that the board-level integrators actually did better than MS's documentation indicates. My understanding is that the multiple CPU's actually derive their clocks from the same reference, and get their reset pulse at the same time, so that even though the counters are separate registers, they actually increment in lockstep. That's why I described the tests as "overconservative." This was not always true - Compaq got it spectacularly wrong in the 486 era - but I have contacts that report good results with patching out the test on modern systems. That's why I'm trying to identify how to make the tests more permissive. And, well, the perf counter is just too useful on today's typical desktop machines for me to give it up entirely. I still think we can get the best of both worlds - no unpredictable 10 ms delays and the high-resolution counter in places where it works. matt-newman added on 2005-11-11 22:03:00: Logged In: YES user_id=1333796 I don't think you quite follow - on all the MP hyperthreaded systems I have tested from Dell and HP the criteria fails and it falls back to 10ms granularity. Also PerfCounters is plain flawed on MP boxes anyway due to the fact that unless your process is locked on on CPU the numbers returned by PerfCounters depend on which CPU your code is executing on at that momement! If you want sub-1ms timing for profiling purposes that is entirely a different issue from the primary timing sub-system. When we profile us have an extension that works like the tcl [time] command, but does two things to make the results predicable - 1. it locks the process onto a cpu (non-HT) and uses perf-counters for the extra resolution - this yields excellent results for profiling... Also another thing to consider, if you like I, ship commercial server processes that heavily use [after] it is not acceptable to have the timing of a [after 1] be 10ms on some systems and not others - it undermines the entire design of the application. So in considering this patch, I ask you to seperate your (valid) concerns about good profiling, from the more general issues of a highly time- dependent event loop. Matt kennykb added on 2005-11-11 21:38:57: Logged In: YES user_id=99768 Matt, I'm not quite willing yet to give up on the performance counter - simply because it's the only timing reference with sub-millisecond precision that we have. As I read your patch, it rolls back to a state where [clock clicks -microseconds] returns something precise only to the millisecond - and moreover, on most machines, accurate only to the video frame. For profiling, that's a horrible price to pay. I'm most definitely willing - indeed, eager - to add the multimedia timer approach on systems where the performance counter does not pass the tests. Doing that would at least give us a consistent timing reference on those machines. But I'm more concerned about the MP machines that "fail" the tests. My understanding of the situation is that machines on which the performance counter is unreliable are actually rare, and the tests are quite over-conservative-- excluding all MP machines except for GenuineIntel hyperthread. I suspect that a few extra tests for machines on which the perf counter is "safe" may well cover your production machines. I've actually encountered only a couple of machines on which the vendor got it wrong, and they are rather antiques now. Have you tried simply patching out the checks for perf counter frequency (the block of code conditioned on "#if !defined(_WIN64)")? If so, what were the results? My suspicion is that on modern machines, that may well Just Work - and in that case, we'd win simply by making the checks within that block more permissive. matt-newman added on 2005-11-11 18:32:45: File Added - 155863: tclWinTime.c.diff

Attachments:

tclWinTime.c.diff [download] added by matt-newman on 2005-11-11 18:32:44. [details]