Tcl Source Code

View Ticket
Login
Ticket UUID: 2901998
Title: Inconsistent buffered I/O
Type: Bug Version: None
Submitter: ferrieux Created on: 2009-11-22 10:49:10
Subsystem: 25. Channel System Assigned To: andreas_kupries
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2013-05-02 04:17:00
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2009-12-09 23:04:22
Description:
On a seekable r+ channel (like a regular file), read and write operations have inconsistent interactions depending on the kind of buffering:

Suppose file "foo" contains "ABC":

 set f [open foo r+]
 fconfigure $f -buffering none 
 puts -nonewline $f Z
 puts [read $f] ----> BC

Now reset the file to ABC and try it in buffered mode:

 set f [open foo r+]
 fconfigure $f -buffering full 
 puts -nonewline $f Z
 puts [read $f] ----> ABC

This is inconsistent. Buffering should only have effects on performance, not on semantics.

Being consistent would imply always keeping the notion of a single read/write "cursor" at script level, regardless of the underlying OS-level file position pointer.
Note that maintaining this simple coupling means simpler code in the core, the only new thing being that a read operation could now cause a flush. I have no problem with this side-effect, since buffering/flushing always happens behind the scripter's back anyway ([seek] already does it sometimes).

This bug is related to, but much simpler and spectacular than, bug 2176669.
The fix proposed here would solve both simultaneously BTW.
User Comments: ferrieux added on 2013-05-02 04:17:00:

allow_comments - 0

Backported as e7d7fe409a .
The only nontrivial thing was to bring along a couple of convenience wrappers (ChanRead,ChanSeek) from the more extensive set of 8.6.
Note the commit still does not contain a test case :(

andreas_kupries added on 2013-05-02 00:56:54:
I do not remember any.

dgp added on 2013-05-01 20:05:33:
This bug has been noticed again on comp.lang.tcl.

Was there a reason not to backport the fix to Tcl 8.5?

andreas_kupries added on 2009-12-10 06:04:22:

allow_comments - 1

Ok, applied to head, builds ok, tests ok. Committed.

andreas_kupries added on 2009-12-10 03:18:51:
Alexandre, are you waiting for my go ahead to commit, or me to commit ? I mean, we are relatively sure that this is ok, specially with Tom's testsuite and results as posted to tcl-core by him.

andreas_kupries added on 2009-12-02 07:05:24:
[16:01]akuDoes the patch cover Kevin's [chan truncate] case ? Have to handle non-blocking channel as well, for flush
[16:02]ferrieuxfwiw, it handles the truncate case. Have to check about the non-blocking cases.
[16:02]akuok

ferrieux added on 2009-12-02 07:01:04:
Just verified that it also fixes 2176669 , as previously claimed ;-)

ferrieux added on 2009-12-02 06:54:07:

File Added - 353491: io2.patch

ferrieux added on 2009-12-02 06:53:03:
OK thanks to your help on the chat, I have now a working patch.
Key was acknowledging the lack of symmetry read/write i.e call WillRead on lowest layer (ChanRead) while calling WillWrite on highest one (WriteBytes/Chars), to be on the proper side of buffering.
Another key was doing a ChanSeek(-queuedInput) when discarding input buffers.
Please review, including the seek replacement in truncate.

andreas_kupries added on 2009-12-02 03:03:59:
Note the code in rev 1.168 of tclIO.c, lines 6782 to 6847. This is inside of Tcl_Seek  and how it flushes pending output. Taking care of async and such as well. Is your channel possibly set to -blocking 0 ?

I propose that we re-use this code, putting as much as possible into one or two functions which can then be called from the read code as well. (2 functions because seek needs this around the seek, and for read/gets we will likely need both called before reading)

Looking further ... I see the replacement of a Tcl_Seek(0,CURR) by WillRead/Write ... Let us try to get this working without the replacement, and then we can try if the replacement is a proper equivalence.

ferrieux added on 2009-12-01 08:14:52:

File Added - 353266: io.patch

ferrieux added on 2009-12-01 08:14:02:
Attaching a patch doing this... actually _trying_ to :}
Apparently something's wrong with the write cursor when the flush is done in the WillRead function. Can you explain ?

andreas_kupries added on 2009-12-01 01:12:23:
Quoting Kevin Kenny from his mail on clt, this seems to be our best writeup so far of what we have to do:

[...] the proposed solution entirely.  It's more or less:

  - Prior to a [read] (equivalently, a [gets]), check if there
    are buffered data to write. If there are, write them before
    reading the file. (Obviously, any data in read buffers may
    be retained.)

  - Prior to a [seek], check if there are buffered data to
    write. If there are, write them before seeking.

    There's an obvious performance tweak to bypass this flushing
    behaviour if the [seek] is not actually moving the cursor,
    and further performance might be obtainable if the library
    can determine that the new cursor is inside the current
    buffer. For now, I propose to ignore those optimizations,
    which represent an unusual case that requires a fair bit
    of bookkeeping to maintain consistency.

    As far as input buffers go, I'd propose flushing them on
    [seek] as well, but it wouldn't be a tremendous amount of
    bookkeeping to detect the case where the new cursor lies
    within the buffer and adjust accordingly. This issue might
    be worth doing, because there's a plausible case where a
    program will read data from a file, use the data just read
    to infer the file's encoding, seek back to the beginning,
    configure a new encoding and read again.

  - Prior to a [puts], spoil any buffered input data, which will
    be incoherent after the write.  If the current cursor lies
    within an input buffer, it might be possible to retain the
    buffer by overwriting the data therein with the newly-written
    data. Again, I propose simply to ignore the case.

  - Prior to a [truncate], flush write buffers and discard read
    buffers.

ferrieux added on 2009-11-26 14:52:37:
To clarify the proposed fix: the idea is to

(1) have a single read+write cursor, and
(2) handle properly the case of an overlap between the input and output buffers.

The (2) has two variants:

(2a) flush the output buffer before a read
(2b) patch the input buffer with the overlapping part of the output buffer

Obviously (2a) has the advantage of simplicity, but generates more flushes, some of them during [read]s.
While (2b) preserves the "filesystem cache" model of a single I/O RAM buffer, without changing the internal two-buffer structure.

andreas_kupries added on 2009-11-24 02:54:23:
Stuart Cassoff tested with Tcl 7.6 and gets the ABC ...
IOW, this problem is decade-old (7.6 was released 1996, 13 years ago).

Attachments: