Tcl Source Code

View Ticket
Login
Ticket UUID: 0f94f855cafed92d0e174b7d835453a02831b4dd
Title: Infinite loop on writing to TLS channel on negotiation failure
Type: Bug Version:
Submitter: apnadkarni Created on: 2015-03-27 12:47:24
Subsystem: 25. Channel System Assigned To: aku
Priority: 7 High Severity: Critical
Status: Closed Last Modified: 2015-05-01 18:48:56
Resolution: Fixed Closed By: aku
    Closed on: 2015-05-01 18:48:56
Description:
The following causes a infinite loop with Tcl 8.6.4 and TLS 1.6.4 on Windows and quite likely other platforms as well.

% package require tls
1.6.4
% set so [tls::socket www.afleventoffice.com.au 443]
sock00000000046A0D10
% puts $so "GET /" ; flush $so

To reproduce, you must use the above address or one where tls negotiation fails.
User Comments: aku added on 2015-05-01 18:48:56:
Talked with apn on chat, gave him a win32 binary to test, and he is not stuck anymore either. Gets the same error back (error flushing ...).

This patch has now been committed to the TLS cvs repository at SF
(https://sourceforge.net/p/tls/), together with
https://sourceforge.net/p/tls/bugs/57/

The TLS version was bumped to 1.6.5.

aku added on 2015-04-30 19:40:55:
A linux/64 build with that patch (and the TLS #57 patch (https://sourceforge.net/p/tls/bugs/57/)) does not loop (anymore):

% package require tls
1.6.4
% set so [tls::socket www.afleventoffice.com.au 443]
sockbb6040
% puts $so "GET /"
% flush $so
error flushing "sockbb6040": connection reset by peer
%

Should I provide win binaries somewhere for additional testing ?

aku added on 2015-04-30 17:07:40:
Gah. This went under the rails with various distractions around.

First, my current TLS build does not contain the patch yet.

Two, I can confirm the infinite loop in flush on a linux/64 system using that very build. I.e. not just Windows, as suspected.

I will now apply the patch and start a rebuild, with more testing of the new binary later in the day. Nag me if I don't add results sometime in the afternoon (Pacific DST).

dgp added on 2015-04-30 14:39:58:
ping.

dgp added on 2015-04-10 15:47:45:
Attached new patch to tlsIO.c of the tls extension
to replace the bad bug fix (which corrupted write 
operations) with the effective one.  Review and
`cvs commit` to the tls SF project is requested.

aku added on 2015-04-09 18:36:45:

Found the origin in my mail heap.

Ticket [c31ca233cacd7d6184877c54182f4b3a6ccb30f1].


aku added on 2015-04-09 17:54:58:

The patch dgp talks about the one attached to

https://sourceforge.net/p/tls/bugs/53/

I have found a local file tls-eof.patch which contains just the eof change,unfortunately without any comments. Nothing recorded in the ChangeLog either. My plan now is to look/search my mail for possible communication about this. I suspect that the patch came from here/me as fix for some bug I do not remember at the moment :( And I apologize for forgetting to make a proper writeup of it.


dgp added on 2015-04-09 13:39:02:
The trouble is rooted in a "Tcl_Eof" stanza of code
introduced with this commit:

http://tls.cvs.sourceforge.net/viewvc/tls/tls/tlsIO.c?r1=1.16&r2=1.17

That commit includes a patch of mine, but the Tcl_Eof bit
came in from somewhere else.  We need aku to try to identify
the origin, so we can see the issues related to removing it.

apnadkarni added on 2015-04-03 05:56:43:
I am inclined to disagree with aku for a couple of reasons. The specific bug being reported is an infinite loop when negotiation fails, NOT that the negotiation fails. It appears from dgp's test that the negotiation does NOT fail on linux for whatever reason (including the defaults that openssl was compiled with) so it is difficult to tell what would happen on that platform if negotiation were indeed to fail.

Secondly, the trace I summarized in a previous comments is completely in platform independent code. TLS detects EOF and returns 0, FlushChannel decrements remaining byte count by 0, and keeps looping. Of course it is possible that detection of EOF differs between platforms but nevertheless there is a disconnect (pun unintended) between how TLS thinks EOF should be indicated and how the FlushChannel expects it.

aku added on 2015-04-02 20:39:21:
I would say that your experience with linux is an indicator that the issue is platform-specific to Windows.

dgp added on 2015-04-02 20:28:49:
Works For Me:

% time {
set so [tls::socket www.afleventoffice.com.au 443]
puts $so "GET /" ; flush $so
}
857427 microseconds per iteration

Does that just mean the site has corrected its
negotiation failure troubles?  Or that I cannot
repro the issue on linux?

apnadkarni added on 2015-03-27 12:56:56:
Tracing through the code, the loop is caused by the following sequence:

- The TLS negotiation is initiated by the flush
- Negotiation fails and the remote end closes the underlying TCP socket
- Tls_WaitForConnect detects EOF and returns 0 (lines 916-7 in tlsIO.c)
- The return value bubbles up to FlushChannel which treats this as the number
of bytes written (tclIO.c line 2694), decrements the write count by this number (0) and loops around to write again which gets dispatched to Tls_WaitForConnect. Which gets us back to the previous step.

Attachments: