Tcl Source Code

View Ticket
Login
Ticket UUID: 2911139
Title: http::geturl abuses vwait on async call
Type: Bug Version: current: 8.5.15
Submitter: kakaroto Created on: 2009-12-09 02:33:26
Subsystem: 29. http Package Assigned To: aku
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2022-09-09 17:40:06
Resolution: Fixed Closed By: kjnash
    Closed on: 2022-09-09 17:40:06
Description:
Hi,
As discussed on IRC with some of you, aMSN keeps crashing because of "too many nested evaluations", and it seems to be caused by a nested event loop by calling update/vwait/tkwait. I found out what the cause is and it seems to be coming from the http package.
I'll just copy/paste what I said on IRC, and I'll attach a test tcl file to reproduce the problem :

==========================================================
<KaKaRoTo> it basically simulates someone who tries to do 100 geturls asynchronously, and they're all being done from a timer where the geturl gets called after a simple execution stack of maybe 5 functions...
<KaKaRoTo> with msn I have to do a SOAP (http) request for every contact, so if you have 200 contacts, I'd do 200 geturls.. 
<KaKaRoTo> I definitely want the geturl to be async! that's why I do -timeout to the geturl, but doing so makes http do a vwait, which causes the above error
<KaKaRoTo> (ps launch the script with 'wish tst.tcl', if you just do tclsh, it will end because there's no mainloop)
[...]

<ijchain> <kbk> KaKaRoTo - Don't Do That. Use the -command option to http::geturl so that it doesn't block
<KaKaRoTo> kbk: I use -command AND -timeout
<KaKaRoTo> if -timeout is not specified then http will not create the socket with -async
<KaKaRoTo> I was expecting my -command to get called if a connection error happens, or if the timeout was reached and the http request was not completed
<KaKaRoTo> I don't understand why -timeout would make it async AND make it vwait... 
<ijchain> <kbk> Looking at the code --
<ijchain> <kbk> the only 'vwait' is in 'http_wait':
<ijchain> <kbk> and the only call to http_wait is gated:
<ijchain> <kbk>     if {! [info exists state(-command)]} {
<ijchain> <kbk>  http_wait $token
<ijchain> <kbk>     }
<KaKaRoTo> kbk: there's another one
<KaKaRoTo> there are two calls to 'wait' in the geturl function
<KaKaRoTo>     # Wait for the connection to complete
<KaKaRoTo>     if {$state(-timeout) > 0} {
<KaKaRoTo>  fileevent $s writable [list http::Connect $token]
<KaKaRoTo>  http::wait $token
<ijchain> <kbk> ugh - I'm looking at the wrong http.tcl, sec...
<KaKaRoTo> kbk: np
<KaKaRoTo> we use http 2.4.4, because we had problems with earlier versions, so people upgrade their system, amsn stops working etc.. so we bundle 2.4.4 and do a package require -exact 2.4.4
<KaKaRoTo> but I checked the version in CVS and it has the same code
<ijchain> <kbk> KaKaRoTo - Log a bug. This can't be right.
==========================================================

So, as explained, in my opinion, the http::geturl should use -async on the socket if -command is specified, it should also call the callback command in case of a connection error instead of vwait-ing in geturl to return an error.

I hope this gets fixed soon... aMSN is basically completely broken for everyone because of this.

Thanks,
KaKaRoTo
User Comments: jan.nijtmans added on 2015-10-14 09:40:28:
Status?

jan.nijtmans added on 2013-10-25 11:50:18:

I just got a http-4.14 test failure on a machine where port 6666 was in use for something else. What happens if you change the port number in this line to something else? :

set badurl //[info hostname]:6666


stwo added on 2013-09-09 01:39:20:
I've only tested on OpenBSD; I don't know about elsewhere.

jan.nijtmans added on 2013-09-08 12:53:51:
>Still a problem with 8.5.15rc0 and 8.6.1rc0

On OpenBSD only?

stwo added on 2013-09-07 17:49:26:
Still a problem with 8.5.15rc0 and 8.6.1rc0.

nijtmans added on 2013-05-27 18:22:19:
>If 8.5.14 as released doesn't suffer (you'd say) then why do I get this
>error?
Are you sure you use the released 8.5.14? In your comment, you say:
>8.5.14/OpenBSD
>[a34d23182d]
But the released 8.5.14 is [fdc2caede9], not [a34d23182d].

Please retry with current "core-8-5-branch". If you can
reproduce it there, please re-open this bug.

stwo added on 2013-05-27 17:59:26:
If 8.5.14 as released doesn't suffer (you'd say) then why do I get this error?

Should I file a new bug?

nijtmans added on 2013-05-27 17:02:41:
Many thanks, Alex, for taking care of this! I fully agree with your remarks.

ferrieux added on 2013-05-25 23:46:06:
That commit [a34d23182d] was identified as bogus and fixed by Jan himself the same day [1bbf0da818] , and then merged back into 8.5 mainline by dkf  the day after: [5562d8bc15] . 
So, 8.5.14 as released doesn't suffer, I'd say ;)

stwo added on 2013-05-25 15:31:50:

allow_comments - 0

8.5.14/OpenBSD
[a34d23182d]

==== http-4.14 http::Event FAILED
==== Contents of test case:

    set token [http::geturl $badurl/?timeout=10 -timeout 10000 -command \#]
    if {$token eq ""} {
        error "bogus return from http::geturl"
    }
    http::wait $token
    lindex [http::error $token] 0

---- Test generated error; Return code was: 1
---- Return code should have been one of: 0 2
---- errorInfo: couldn't open socket: connection refused
    while executing
"http::geturl $badurl/?timeout=10 -timeout 10000 -command \#"
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $script"
---- errorCode: NONE
==== http-4.14 FAILED

dkf added on 2013-01-23 22:13:39:

allow_comments - 1

Thanks for committing to trunk, Jan; I got diverted by a meeting…

dkf added on 2013-01-23 16:06:48:
Committed to 8.5 branch; not yet looked into porting to trunk.

dkf added on 2013-01-23 13:45:02:
Works for me now. :-)

nijtmans added on 2013-01-23 06:01:48:
>I have no idea at all how we're getting into that state!
I think I have now. The
       can't read "state(sock)":...
comes from line 271, just a left-over from the $errorInfo
at the previous http::reset.

But the "connect failed connection refused" is a
better string to test for, so we know exactly what
happend and not some random other error. Done
now in the "bug-2911139" branch.

I don't get the "timeout" any more, which I got at the
office. At home this test-case passes now, both on
win32 as on Linux. I'm satisfied, everything looks OK
to me now. Agreed?

dkf added on 2013-01-23 00:19:30:
I'm not sure; I'm out on a business trip this week so I don't know how clean my network configuration is. OTOH, if I do:

    make test TESTFLAGS="-file h*.test -match http-4.14"

This runs just the relevant test, and it fails very quickly; it doesn't look like a timeout to me. If I add:

   puts [set ${token}(error)]

To the test immediately before the [http::status $token], I get this output as well:

{connect failed connection refused} {can't read "state(sock)": no such element in array
    while executing
"fileevent $state(sock) writable {}"} NONE

I have no idea at all how we're getting into that state!

nijtmans added on 2013-01-22 23:07:38:
It might also be that the test outcome is timing-dependant: What
if the [http::wait $token] awakens twice, once when the connection
is created (with invalid socket), once when the connection times
out. Then it just depends on timing whether the last [http::status]
returns "error" or "timeout". Shouldn't the error-situation cancel
the time-out then?

nijtmans added on 2013-01-22 23:00:39:
>If the bad url is a valid URI but with an invalid/unreachable server, then
>you expect it to error out in the callback, and not for it to timeout
Well, "timeout" is what I see. If that's wrong, maybe the error
handling is still not 100% correct. I'm not sure yet.

dkf added on 2013-01-22 16:48:50:
At the moment, I'm just reporting a test failure when I do 'make test' :-)

kakaroto added on 2013-01-22 16:07:47:
Unless I misunderstood the test, isn't error what you're expecting?
If the bad url is a badly formatted URI, then you should expect an error right away.
If the bad url is a valid URI but with an invalid/unreachable server, then you expect it to error out in the callback, and not for it to timeout. It would only timeout if the server was real and dropping packets, so you don't get a "host unreachable" ICMP or "connection refused" error. Otherwise, it will give you an error directly.

dkf added on 2013-01-22 15:42:18:
I'm still getting a failure, but now it's getting 'error' rather than 'timeout' as the result string.

Generally, I think that fixing the test is (probably) the correct approach, as the test is looking for something that is intended to be changed. Once we've got the test working, merging through to the 8.5 branch, trunk and novem (though there's no hurry on novem) seems reasonable.

nijtmans added on 2013-01-22 15:19:11:
I think that kakaroto is right, shouldn't the testcase just check whether the timeout occurred?

Suggested test-case fix committed to the branch. Agreed? I'm OK with this
change being merged to core-8-5-branch and up, but I'm not the expert here....

kakaroto added on 2013-01-22 08:02:25:
Hey, dkf! Glad to see this bug taken care of! :)
It looks like that test case expects an error to happen with the badurl, but since there's a -command, the socket gets created asynchronously, which means the error will happen in the command callback, not during the socket creation.
So I think that this is a good thing indeed and working as expected. You'd just have to confirm that the error happens asynchronously.
Thanks!

dkf added on 2013-01-20 05:24:55:

IP - Comment Removed: 82.42.214.208

dkf added on 2013-01-20 05:24:51:

IP - Comment Removed: 82.42.214.208

dkf added on 2013-01-20 05:24:33:
Causes one test failure (http-4.14) but I don't know if that's significant yet. Or maybe it is even a good thing! I'll have to analyze further. Failure is as below (and is the only one):

==== http-4.14 http::Event FAILED
==== Contents of test case:

    set token [http::geturl $badurl/?timeout=10 -timeout 10000 -command \#]
    if {$token eq ""} {
error "bogus return from http::geturl"
    }
    http::wait $token
    http::status $token
    # error code varies among platforms.

---- Test completed normally; Return code was: 0
---- Return code should have been one of: 1
==== http-4.14 FAILED

dkf added on 2013-01-17 21:02:50:
(once core comes back from its unscheduled local network troubles, that is…)

dkf added on 2013-01-17 21:01:49:
I'll have a look

nijtmans added on 2013-01-17 20:29:58:
kakaroto's patch committed to branch "bug-2911139" (based on core-8-5-branch). Looks good to me.

Donal, do you agree?

kjnash added on 2012-10-30 10:05:25:
Sorry folks, I couldn't add the files without creating a new bug report, which is 3581754.  The two patches are there.

kjnash added on 2012-10-30 09:43:54:
I had problem with the same symptoms, so I'll add my comments to this tracker, though I am not sure if my fix will work for the original problem.

My problem was with EOF detection when a server closes a keep-alive connection.

If http::geturl is called with -keepalive 1 and a -command that enters the event loop (e.g for an authorization dialog), and the HTTP response includes a "Content-Length" header, then a (slow) race condition ensues between the server's closing of the connection, and the user's response to the dialog.  If the connection is closed first, the unpatched code fails at line 1126 with repeated eof fileevents, because the command http::Eof tries repeatedly to run the user's -command, and never returns.  This can be reported as "too many nested evaluations".

While it might not be good style to allow the -command script to enter the event loop, http(n) does not exclude this possibility.

There's more than one way to fix the bug.  Two possible patches are attached.  EIther will fix the bug; both fixes can be applied.

Patches against core-8-5-branch checkin 69687a01db, dated 2012-10-24 11:28:33.

dkf added on 2012-05-27 03:03:33:
We should consider what impact coroutines would have on this in 8.6

ferrieux added on 2011-09-05 14:17:35:
Hum, looks like this got completely drowned, never committed to trunk, right ?
As of today, in both 8.[56], there is no way to do a fully async http request (including an async connect): the code insists on doing [socket -async] *only* in presence of -timeout, and -timeout forces a vwait.
Why that nonsense ? (Youness's version gets it right)

kakaroto added on 2010-01-12 06:13:16:
typo in my previous comment, the url should have had revision=11889... anyways, I did a diff (ignoring whitespace changes) and I added it to this bug.
Thanks for checking it out!

kakaroto added on 2010-01-12 06:12:24:

File Added - 358616: http.patch

kakaroto added on 2010-01-11 09:15:47:
Alright, I found the bug and fixed it in SVN revision 11889 : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885
The reason why my fix didn't work is because I had a typo :
if {[info exists $state(-command)] } 
instead of 
if {[info exists state(-command)] }

p.s: I just realized I have some whitespace issues in my code, maybe it's because I did a 'untabify' on the tab, so you may want to do a 'ignore whitespace changes' diff. Sorry.

kakaroto added on 2010-01-11 08:50:36:
Hi,
I fixed this bug in the latest http 2.7.5 (from core-8-5 branch in CVS).
You can get the code here : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885
You can see my changes if you diff it against CVS 1.67.2.9 : http://tcl.cvs.sourceforge.net/viewvc/*checkout*/tcl/tcl/library/http/http.tcl?revision=1.67.2.9&pathrev=core-8-5-branch

In theory, no more vwait is used anymore when you do the call to geturl with 
 a -command option.
However, the test code previously attached to this bug is still having the same issue, but I have no idea why unfortunately... :(
I hope this gets merged upstream.
Thanks.

kakaroto added on 2009-12-09 09:33:27:

File Added - 354463: tst-http.tcl

Attachments: