Tcl Source Code

View Ticket
Login
Ticket UUID: 219148
Title: IO_PERFORMANCE on NT 80 times slower then on unix
Type: Bug Version: obsolete: 8.0.3
Submitter: nobody Created on: 2000-10-26 05:02:19
Subsystem: 27. Channel Types Assigned To: andreas_kupries
Priority: 6 Severity:
Status: Closed Last Modified: 2001-09-08 00:11:51
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2001-09-07 17:11:51
Description:
OriginalBugID: 948 Bug
Version: 8.0.3
SubmitDate: '1998-12-14'
LastModified: '2000-06-22'
Severity: MED
Status: Assigned
Submitter: pat
ChangedBy: hobbs
OS: Windows NT
Machine: X86
FixedDate: '2000-10-25'
ClosedDate: '2000-10-25'


Name: Uwe Traum

ReproducibleScript:

proc dotest {{filename test.bin}} {
    set fid [open $filename w]
    fconfigure $fid -translation binary

    for { set i 0 } { $i < 2000 } { incr i } {
        set ind [expr {128*int(rand()*30000)}]
        #seek $fid $ind start
        puts -nonewline $fid "123456789012345678901234567890"
    }
    close $fid
}
time dotest 3

[rewritten by hobbs as proc]

ObservedBehavior:
Output:

NT4;local disk;PentiumPro 200: 155172000 microseconds per iteration
Solaris2.5;local disk;sparc20: 1844860 microseconds per iteration

on unix it's 80 time faster than on NT!!!

DesiredBehavior:
same speed


In FileOutputProc (tcl8.0.3/win/tclWinChan.c,line 560) there
is ALWAYS a call to FlushFileBuffers. 
So every I/O is written directly to disk. 
That's why the Disk-LED is permanently blinking.

What's the reason for this call ?
Can it be removed ?

thanks 
--
This is verified in 8.4a1.  The disk LED does stay permanently
on under NT.  Using the Performance Monitor, it does seem that
excessive flushing may be occuring.
 
-- 06/22/2000 hobbs
User Comments: andreas_kupries added on 2001-09-08 00:11:51:
Logged In: YES 
user_id=75003

Committed to both head and core-8-3-1-branch.

hobbs added on 2001-09-07 07:01:38:
Logged In: YES 
user_id=72656

Looks great.

andreas_kupries added on 2001-09-07 06:46:25:

File Added - 10463: 219148.patch

Logged In: YES 
user_id=75003

Added a patch solving the problem. Used the idea of a 
boolean flag and flushing only the channels which were 
written too and only when requesting size information.

andreas_kupries added on 2001-09-05 07:20:47:
Logged In: YES 
user_id=75003

Ideas from David Graveraux: 

The only thing I know is that if there's uncommitted 
buffers the OS holding, a
request for file size won't cause the OS to commit the 
buffers first.

A look at using I/O completion ports for writing to disk 
from within Tcl >might<
be a good work-around for tracking what the OS hasn't 
committed yet.  I can't
say for sure.  The amount of code for tracking could get 
very large.  Adding an
explicit flush to the channel driver might be the best 
alternative, but explicit
at the script level to the user instead of the implicit one 
as is now.

That's all I know.

>Hm. We have a flushproc in the driver, it is just not used 
yet. This
>could contain the OS-Flush on windows and be called by 
[flush] after
>it has committed the tcl buffers to the OS. This does not 
help with the
>test which check file sizes to check the correctness 
the 'implicit'
>flushes. And the moment we add the OS-flush to them we are 
back to the
>current situation.

half way there...  add a [flush] to the tests, that will do 
FlushFileBuffers()
or whatever was the API func...

It's not the same.  Make [flush] not only flush the channel 
but commit the OS
buffers, too.  Normal mode flushing of the channel buffer 
doesn't have to also
mean flushing the OS buffers, too.

andreas_kupries added on 2001-08-24 07:30:06:
Logged In: YES 
user_id=75003

More ideas (coming from Jeff).
________________________________________
What happens on Windows if another process
opens the file ? Does that process also
get the bogus file size ?
________________________________________
Are there Win* APIs we could use to peek
into the buffering done by Windows ?
We could use this instead of the counters.

Or we could use this in [file size] to
report a better size.

andreas_kupries added on 2001-08-24 07:27:30:
Logged In: YES 
user_id=75003

Ideas to solve this problem collected so far.
________________________________________
Just remove the forced OS flush for
Windows. Make the tests 'unixOnly'.

Anticipated Effects:

-Speedup for Windows I/O compared to
current solution.

-No change for the other platforms.

-The coverage of code paths by the
testsuite decreases. In other words,
the testsuite becomes worse.
________________________________________
Add counters in the channel structures
(on the driver side) to count how many
bytes were read and written to the OS.

Add testchannel subcommands to access this
information instead of using [file size].

The tests will have to be rewritten.

Anticipated Effects:

-General slowdown in the I/O system
for all platforms (Counter management).
Should be negligible though.

-Speedup for Windows I/O compared to
current solution.

-The testsuite stays in shape.
________________________________________
Handle the proposed counters only for Win*.
Write separate tests for Unix and Win*

Anticipated Effects:

-Speedup for Windows.

-No change for the other platforms.

-The testsuite stays in shape.
________________________________________
Add a boolean flag to the Win* structures
(driver side). Indicates if a true flush
was done on the file channel.

Whenever a [file size] is requested the
system goes through the list of file
channels and does an OS flush on all with
the flag not set. The flag is set by this
action. Any write on the channel resets
the flag for that channel. When closing a
file channel do a true flush in the driver.

The testsuite needs no change.

Anticipated Effects:

-Slowdown of [file size] operation
for Win*.

-Speedup of Win* I/O in general.

-No change for the other platforms.

-Essentially emulates Unix behaviour
on Windows for Tcl.

-Adds interaction between the
filesystem and the I/O (channel)
code.

-The testsuite stays in shape.

andreas_kupries added on 2001-08-24 07:17:07:
Logged In: YES 
user_id=75003

Just for the record here are the results of running tclbench
for a tclsh with forced flushing (1) and without (2) for my 
machine (Win NT 5, 128 MB). Used fcopy to exercise the I/O 
system.

$ ./tcl/win/win-dll/tclsh84.exe tclbench/runbench.tcl \
     -match 'FCOPY*' -notk \
     -paths "./tcl/win/win-dll/ ./tcl.nf/win/win-dll/"

000 VERSIONS:            1:8.4a4 2:8.4a4
001 FCOPY binary: 164K   2320137   19575
002 FCOPY encoding: 164K 1583793   39857
003 FCOPY std: 164K      2435353   18588
003 BENCHMARKS           1:8.4a4 2:8.4a4

nobody added on 2001-08-24 06:48:40:
Logged In: NO 

I agree 100% with your summary.

andreas_kupries added on 2001-08-24 06:26:34:
Logged In: YES 
user_id=75003

Ok, I now understand the problem much better. It is 
partially an OS issue and partially an issue of how the 
affected tests were written.

When Tcl 'flushes' a channel it actually only writes its 
internal buffers to the OS and then forgets about the data. 
The OS is free to delay the actual write to disk.

The affected tests try to check that the flushing behaviour 
of tcl is correct. To do so they perform some writes and 
then check the size of the resulting file.  But this 
meansthat they actually check the flushing behaviour of Tcl 
itself and how the OS deals with pending data when it comes 
to reporting the size of a file.

Both Unix and Win* platforms delay writing data to disk 
until they have idle time, or by grouping nearby block 
together, etc. But obviously Win* is more lazy than Unix 
when it comes to reporting the size of a file with pending 
writes. Win* reports the size actually on disk, no matter 
how much data is pending. Unix goes to the trouble and 
calculates the size of the file as if the pending data had 
been written to the disk.

The current solution of this problem is to force Win* to 
actually write all the data written to it by Tcl to the 
disk too, without delay. This gets us the reliable file 
sizes the tests need to perform correctly, at the expense 
of general I/O performance.

andreas_kupries added on 2001-08-24 05:55:32:
Logged In: YES 
user_id=75003

This is the list of tests which fail if flushing is 
disabled in the windows file driver:

io-27.2 FlushChannel, some output buffered
io-27.4 FlushChannel, implicit flush when buffer fills
io-27.5 FlushChannel, implicit flush when buffer fills and 
on close
io-29.4 Tcl_WriteChars, buffering in full buffering mode
io-29.5 Tcl_WriteChars, buffering in line buffering mode
io-29.6 Tcl_WriteChars, buffering in no buffering mode
io-29.7 Tcl_Flush, full buffering
io-29.8 Tcl_Flush, full buffering
io-29.17 Tcl_WriteChars buffers, then Tcl_Flush flushes
io-29.18 Tcl_WriteChars and Tcl_Flush intermixed
io-29.19 Explicit and implicit flushes
io-29.20 Implicit flush when buffer is full
io-29.28 Tcl_WriteChars, lf mode
io-39.6 Tcl_SetChannelOption, multiple options
io-39.7 Tcl_SetChannelOption, buffering, translation
io-39.8 Tcl_SetChannelOption, different buffering options
io-52.7 TclCopyChannel

andreas_kupries added on 2001-08-24 04:40:22:
Logged In: YES 
user_id=75003

The actual id is #219300 after SF did its renumbering dance.

dkf added on 2001-02-01 03:45:41:
See also Bug #119300 - we've so many unclosed bugs that it is impractical to link related ones... <sigh>)

davygrvy added on 2001-01-13 07:35:16:
File channel driver on Win* forces a flush.  It really doesn't need to, but some file tests depend on it doing a true write to disk.  So therefore, it's slower.

Attachments: