Ticket UUID: | 219148 | |||
Title: | IO_PERFORMANCE on NT 80 times slower then on unix | |||
Type: | Bug | Version: | obsolete: 8.0.3 | |
Submitter: | nobody | Created on: | 2000-10-26 05:02:19 | |
Subsystem: | 27. Channel Types | Assigned To: | andreas_kupries | |
Priority: | 6 | Severity: | ||
Status: | Closed | Last Modified: | 2001-09-08 00:11:51 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2001-09-07 17:11:51 | |||
Description: |
OriginalBugID: 948 Bug Version: 8.0.3 SubmitDate: '1998-12-14' LastModified: '2000-06-22' Severity: MED Status: Assigned Submitter: pat ChangedBy: hobbs OS: Windows NT Machine: X86 FixedDate: '2000-10-25' ClosedDate: '2000-10-25' Name: Uwe Traum ReproducibleScript: proc dotest {{filename test.bin}} { set fid [open $filename w] fconfigure $fid -translation binary for { set i 0 } { $i < 2000 } { incr i } { set ind [expr {128*int(rand()*30000)}] #seek $fid $ind start puts -nonewline $fid "123456789012345678901234567890" } close $fid } time dotest 3 [rewritten by hobbs as proc] ObservedBehavior: Output: NT4;local disk;PentiumPro 200: 155172000 microseconds per iteration Solaris2.5;local disk;sparc20: 1844860 microseconds per iteration on unix it's 80 time faster than on NT!!! DesiredBehavior: same speed In FileOutputProc (tcl8.0.3/win/tclWinChan.c,line 560) there is ALWAYS a call to FlushFileBuffers. So every I/O is written directly to disk. That's why the Disk-LED is permanently blinking. What's the reason for this call ? Can it be removed ? thanks -- This is verified in 8.4a1. The disk LED does stay permanently on under NT. Using the Performance Monitor, it does seem that excessive flushing may be occuring. -- 06/22/2000 hobbs | |||
User Comments: |
andreas_kupries added on 2001-09-08 00:11:51:
Logged In: YES user_id=75003 Committed to both head and core-8-3-1-branch. hobbs added on 2001-09-07 07:01:38: Logged In: YES user_id=72656 Looks great. andreas_kupries added on 2001-09-07 06:46:25: File Added - 10463: 219148.patch Logged In: YES user_id=75003 Added a patch solving the problem. Used the idea of a boolean flag and flushing only the channels which were written too and only when requesting size information. andreas_kupries added on 2001-09-05 07:20:47: Logged In: YES user_id=75003 Ideas from David Graveraux: The only thing I know is that if there's uncommitted buffers the OS holding, a request for file size won't cause the OS to commit the buffers first. A look at using I/O completion ports for writing to disk from within Tcl >might< be a good work-around for tracking what the OS hasn't committed yet. I can't say for sure. The amount of code for tracking could get very large. Adding an explicit flush to the channel driver might be the best alternative, but explicit at the script level to the user instead of the implicit one as is now. That's all I know. >Hm. We have a flushproc in the driver, it is just not used yet. This >could contain the OS-Flush on windows and be called by [flush] after >it has committed the tcl buffers to the OS. This does not help with the >test which check file sizes to check the correctness the 'implicit' >flushes. And the moment we add the OS-flush to them we are back to the >current situation. half way there... add a [flush] to the tests, that will do FlushFileBuffers() or whatever was the API func... It's not the same. Make [flush] not only flush the channel but commit the OS buffers, too. Normal mode flushing of the channel buffer doesn't have to also mean flushing the OS buffers, too. andreas_kupries added on 2001-08-24 07:30:06: Logged In: YES user_id=75003 More ideas (coming from Jeff). ________________________________________ What happens on Windows if another process opens the file ? Does that process also get the bogus file size ? ________________________________________ Are there Win* APIs we could use to peek into the buffering done by Windows ? We could use this instead of the counters. Or we could use this in [file size] to report a better size. andreas_kupries added on 2001-08-24 07:27:30: Logged In: YES user_id=75003 Ideas to solve this problem collected so far. ________________________________________ Just remove the forced OS flush for Windows. Make the tests 'unixOnly'. Anticipated Effects: -Speedup for Windows I/O compared to current solution. -No change for the other platforms. -The coverage of code paths by the testsuite decreases. In other words, the testsuite becomes worse. ________________________________________ Add counters in the channel structures (on the driver side) to count how many bytes were read and written to the OS. Add testchannel subcommands to access this information instead of using [file size]. The tests will have to be rewritten. Anticipated Effects: -General slowdown in the I/O system for all platforms (Counter management). Should be negligible though. -Speedup for Windows I/O compared to current solution. -The testsuite stays in shape. ________________________________________ Handle the proposed counters only for Win*. Write separate tests for Unix and Win* Anticipated Effects: -Speedup for Windows. -No change for the other platforms. -The testsuite stays in shape. ________________________________________ Add a boolean flag to the Win* structures (driver side). Indicates if a true flush was done on the file channel. Whenever a [file size] is requested the system goes through the list of file channels and does an OS flush on all with the flag not set. The flag is set by this action. Any write on the channel resets the flag for that channel. When closing a file channel do a true flush in the driver. The testsuite needs no change. Anticipated Effects: -Slowdown of [file size] operation for Win*. -Speedup of Win* I/O in general. -No change for the other platforms. -Essentially emulates Unix behaviour on Windows for Tcl. -Adds interaction between the filesystem and the I/O (channel) code. -The testsuite stays in shape. andreas_kupries added on 2001-08-24 07:17:07: Logged In: YES user_id=75003 Just for the record here are the results of running tclbench for a tclsh with forced flushing (1) and without (2) for my machine (Win NT 5, 128 MB). Used fcopy to exercise the I/O system. $ ./tcl/win/win-dll/tclsh84.exe tclbench/runbench.tcl \ -match 'FCOPY*' -notk \ -paths "./tcl/win/win-dll/ ./tcl.nf/win/win-dll/" 000 VERSIONS: 1:8.4a4 2:8.4a4 001 FCOPY binary: 164K 2320137 19575 002 FCOPY encoding: 164K 1583793 39857 003 FCOPY std: 164K 2435353 18588 003 BENCHMARKS 1:8.4a4 2:8.4a4 nobody added on 2001-08-24 06:48:40: Logged In: NO I agree 100% with your summary. andreas_kupries added on 2001-08-24 06:26:34: Logged In: YES user_id=75003 Ok, I now understand the problem much better. It is partially an OS issue and partially an issue of how the affected tests were written. When Tcl 'flushes' a channel it actually only writes its internal buffers to the OS and then forgets about the data. The OS is free to delay the actual write to disk. The affected tests try to check that the flushing behaviour of tcl is correct. To do so they perform some writes and then check the size of the resulting file. But this meansthat they actually check the flushing behaviour of Tcl itself and how the OS deals with pending data when it comes to reporting the size of a file. Both Unix and Win* platforms delay writing data to disk until they have idle time, or by grouping nearby block together, etc. But obviously Win* is more lazy than Unix when it comes to reporting the size of a file with pending writes. Win* reports the size actually on disk, no matter how much data is pending. Unix goes to the trouble and calculates the size of the file as if the pending data had been written to the disk. The current solution of this problem is to force Win* to actually write all the data written to it by Tcl to the disk too, without delay. This gets us the reliable file sizes the tests need to perform correctly, at the expense of general I/O performance. andreas_kupries added on 2001-08-24 05:55:32: Logged In: YES user_id=75003 This is the list of tests which fail if flushing is disabled in the windows file driver: io-27.2 FlushChannel, some output buffered io-27.4 FlushChannel, implicit flush when buffer fills io-27.5 FlushChannel, implicit flush when buffer fills and on close io-29.4 Tcl_WriteChars, buffering in full buffering mode io-29.5 Tcl_WriteChars, buffering in line buffering mode io-29.6 Tcl_WriteChars, buffering in no buffering mode io-29.7 Tcl_Flush, full buffering io-29.8 Tcl_Flush, full buffering io-29.17 Tcl_WriteChars buffers, then Tcl_Flush flushes io-29.18 Tcl_WriteChars and Tcl_Flush intermixed io-29.19 Explicit and implicit flushes io-29.20 Implicit flush when buffer is full io-29.28 Tcl_WriteChars, lf mode io-39.6 Tcl_SetChannelOption, multiple options io-39.7 Tcl_SetChannelOption, buffering, translation io-39.8 Tcl_SetChannelOption, different buffering options io-52.7 TclCopyChannel andreas_kupries added on 2001-08-24 04:40:22: Logged In: YES user_id=75003 The actual id is #219300 after SF did its renumbering dance. dkf added on 2001-02-01 03:45:41: See also Bug #119300 - we've so many unclosed bugs that it is impractical to link related ones... <sigh>) davygrvy added on 2001-01-13 07:35:16: File channel driver on Win* forces a flush. It really doesn't need to, but some file tests depend on it doing a true write to disk. So therefore, it's slower. |
Attachments:
- 219148.patch [download] added by andreas_kupries on 2001-09-07 06:46:25. [details]