Tcl Source Code

View Ticket
Login
Ticket UUID: 6141c1518631072728c55b18a6ae4ebbf1a0a3a2
Title: Transchan loses input data when written to
Type: Bug Version: 8.6.4
Submitter: sbron Created on: 2015-12-27 11:03:50
Subsystem: 26. Channel Transforms Assigned To: aku
Priority: 7 High Severity: Important
Status: Open Last Modified: 2016-01-28 20:53:18
Resolution: None Closed By: nobody
    Closed on:
Description:
It seems that any input data buffered on a transchan is lost when the channel is written to. See the attached script for a demonstration. I expect to see 5 times "hello", but only get 1. If either the [chan push $fd trans] or the [puts $fd Hi!] is commented out, the code works as expected.

Tested on OpenSUSE linux x86_64 13.2 (kernel 3.16.7).
User Comments: ferrieux added on 2016-01-28 20:53:18:
Andreas, for completeness: "additional candy" was truly for IOGT rather than IORTrans. Thanks for reassuring me that no tclsh script can call into IOGT without an extension's hand. I understand that tcltest could still trigger the bug in theory, but I don't really care :)

aku added on 2016-01-28 17:41:21:
"I raise the Q because tclIOGT.c is also in Tcl 8.5, so if a fix is coming for it, it could influence 8.5.19 release."

I don't think so. The tclIOGT.c looks to be purely for "testchannel transform", i.e. Tcl's testsuite. Any fix would only affect the testsuite, but not the regular tclsh.

Tcl 8.6 is different because it has tclIORtrans.c which exposes the machinery at script level. Tcl 8.5 does not have transforms at script-level, only base channels.


Fix into channel types vs core ... I went with the easy route, quickly to understand. That drops the pressure and then I can think about if that is fixable in general for all channel types by changing the IO core around.

dgp added on 2016-01-28 16:53:05:
I raise the Q because tclIOGT.c is also in Tcl 8.5, so if a fix
is coming for it, it could influence 8.5.19 release.

Seems the fix is not going into the Tcl_StackChannel() core
operations, but in each Tcl_ChannelType ?  That seems to reduce
the value in worrying about it for 8.5.

aku added on 2016-01-28 16:25:23:
Quick check ... Looks like the answer is yes ... Was wondering where tclIOGT.c is actually used ... Seems this is the 'testchannel transform' command ... Do we still need this with the official transform through tclIORTrans.c ? Maybe we could drop this.

dgp added on 2016-01-28 14:18:16:
Does tclIOGT.c suffer from the same issue?

aku added on 2016-01-28 06:49:50:

The commit [128ad6929bd03c0b] contains a fix. It does not have a new test for the problem yet. It does solve the issue for the transchan script, and passes the existing testsuite.

Please have a look.

I forewent the optimization of removing the seekProc checks in the ReflectSeek(Wide) functions for the moment. With the checking done at creation time the check for seekProc can be removed. But not for wideSeekProc, as that is still can be either for the base/parent chan.

Please trial this.


aku added on 2016-01-28 06:08:10:
Alexandre, can you tell me more about the "Additional candy: transformChannelType->seekProc calls parentSeekProc without even checking for nullity (it does for wideSeek, but not for seek). No idea how to expose this at script level or if it is even possible, but a segfault might be reachable there." ?

I just looked into the tclIORTrans.c on the trunk head and what I see is this:

(1) ReflectSeek
   - redirects to ReflectSeekWide (line 1425)
(2) ReflectSeekWide
    - checks seekProc and bails out if missing. (line 1341)
    - Later checks also wideSeekProc (line 1390), and
      falls back to seekProc (line 1398) if wide is missing or not supported per driver version.

Further notes: The driver structure is used statically. This is in contrast to tclIORChan for base-drivers, which allocates and modifies it to suit the known methods of the script-level implementation.

As such I do not see it able to deref a NULL pointer.

At the moment it looks like I "only" have to modify the transform creation to allocate the driver structure as needed for modifications (no seeking), and destruction to release such allocated structure.

aku added on 2016-01-22 05:27:18:
From the description it is likely possible to get seg.fault from C-level API. Unsure if reachable from script.

ferrieux added on 2016-01-21 22:38:07:
Happily reassigning to you, since I trust your hand 100x more than mine on stacked channels.

For my curiosity: re "additional candy" below, is my fear justified ?

aku added on 2016-01-21 03:34:12:
Oopses.

Currently trying to remember of the transchan driver struct is allocated dynamically or not.

If the driver struct is static then it cannot do anything but set its own seekproc to cover the bases. If the struct is dynamic then we should be able to set the seekProc based on the information in either the underlying transform, or in the base channel (**). In that case however we have to also find a place where we can safely release this structure again, to prevent leakage.

(Ad **) I semi-remember/believe that you can follow pointers to the top- and bottom-channel structures. The bottom would be the base. If so, then another possibility is to always check the seekProc of the bottom of the stack.

I have put this on my TODO list now, with hope of getting to it coming weekend.
Currently focused on various red tape in RL to crunch through.

Feel free to re-assign to me.

ferrieux added on 2016-01-20 23:20:54:
More inspection shows that this truly happens in tclIORTrans.c with type tclRTransformType  rather than tclIOGT and transformChannelType, but the effect is the same: those generic types transform a static criterion (seekProc==NULL) into a dynamic one (seeking fails), in blatant mismatch with assumptions in the rest of the codebase.

Additional candy: transformChannelType->seekProc calls parentSeekProc without even checking for nullity (it does for wideSeek, but not for seek). No idea how to expose this at script level or if it is even possible, but a segfault might be reachable there.

ferrieux added on 2016-01-20 22:51:04:
Though I'm not familiar with transchan (Andreas' baby), it looks like the transformChannelType defines a seekProc in a generic fashion, regardless (of course) of the underlying channel's status. So the test (seekProc!=NULL) becomes unreliable if applied to the stack head (the transform).

Note that this gotcha may also apply to other channel methods, where only pointer nullity is checked throughout the code. Sigh.

Any taker for a transhcan dive ?

ferrieux added on 2016-01-20 22:38:34:
Hum, the code already checks for seekProc!=NULL.
Embarrassing.

ferrieux added on 2016-01-20 22:35:24:
Sure. The crux of the problem is a confusion between two kinds of "r+" channels: (1) those that are truly both readable and writable, and where write+seek+read can give you back what you wrote; and (2) those that are in fact hiding two independent things, one readable, the other writable, with no diaphony ever (e.g. pipes and sockets).

The idea of the patch was that for the write+seek+read invariant to hold, you need to get rid of buffering. Hence flushing before reading, and discarding input before writing.

Of course the patch was intended only for kind (1).

Since channels of kind (1) are seekable and those of kind (2) are not, we've got a drug target :)

dgp added on 2016-01-19 17:55:53:
I notice that the issues raised in 

http://core.tcl.tk/tcl/tktview/2901998

were in the context of seekable channels,
while the unwanted consequences noted here
are seen on non-seekable sockets.  Perhaps
that's a pivot on which both desires could
be satisfied?

dgp added on 2016-01-19 17:49:55:
This (mis-)behavior started with checkin

http://core.tcl.tk/tcl/info/5af0d249def8147a

which was intended to fix the bug

http://core.tcl.tk/tcl/tktview/2901998

Assigning to ferrieux for review.

Attachments: