Tcl Source Code

View Ticket
Login
Ticket UUID: 75d525d37c931812e9b00a80f6f43ac062598d54
Title: Memory corruption transferring server (listening) sockets between threads
Type: Bug Version: >= 8.5
Submitter: apnadkarni Created on: 2020-01-31 12:23:59
Subsystem: 25. Channel System Assigned To: apnadkarni
Priority: 9 Immediate Severity: Severe
Status: Pending Last Modified: 2020-05-29 20:23:04
Resolution: Fixed Closed By: nobody
    Closed on:
Description:

Working on a generic IOCP framework for Windows, I noticed the following issue while trying to grok the Windows socket code. Note it appears the same issue will also exist on Unix.

The socket implementation stores the user supplied script argument to [socket -server] in the AcceptCallback structure along with a pointer to the current interp. If the socket is then transferred to a different thread using thread::transfer, when a connection comes in the callback is run in the context of the owning thread but using the original interp belonging to the original thread with obviously catastrophic results.

There needs to be mechanism to either switch the interp stored in the AcceptCallback structure to the appropriate one in the receiving thread (this seems hard to me given the current implementation), or disallow transfer of server sockets.

I'll try to provide a proof script later.

User Comments: sebres added on 2020-05-29 20:23:04:

Although it is fixed in branch [bug-75d525d37c-sf-chan-transfer] now (platform-unrelated), the fix is a bit tricky: since as Ashok correctly said, Tcl-API doesn't provide any possibility to inform socket that this going attached to some interpreter.
So one could reset interp in accept-data by thread (interp) detach, but it is impossible to be informed that it is attached again.

Check-in [455c680c3a] provides now such information (interp set in statePtr of the channel now), that allows to use this in every socket handler, with prerequisite that socket would be not shared in multiple interpreter of same thread, or (like accept handler of server socket) it does not intended to be used in all interpreters.

This also allows extremely simplify the (un)registering mechanism of socket handlers (like accept-proc and its client-data) and it could be protected by (statePtr->interp == NULL) inside handler and/or simple ref-counter now.

Test, similar to attached by apn, that illustrating the bug, is provided also (just the segfault is not always reproducible).

To use it in modules or extensions either internal structures of tclIO.h are expected, or we must implement new function like "Tcl_GetChannelInterp" (for the single or main interpreter mode only).


sebres added on 2020-05-29 13:22:25:

Confirmed (current 8.6 but also 8.5).
The issue is very similar to [f583715154], and the fix can be almost the same as [532506ef71] but for server side - proper reaction on switch of managing thread, either in event handler or in action proc handler (or in both).

WiP


Attachments: