Tcl Library Source Code

View Ticket
Login
Ticket UUID: f084c0dc03bd57d142bce1703cc5bdd9beb88247
Title: memory leak in tcllib autoproxy
Type: Bug Version: 1.17
Submitter: anonymous Created on: 2015-09-29 15:53:49
Subsystem: http :: autoproxy Assigned To: aku
Priority: 8 Severity: Critical
Status: Open Last Modified: 2019-06-25 00:10:02
Resolution: None Closed By: nobody
    Closed on:
Description:
The autoproxy module appears to have a leak when a http request is made. Observe the RSS is increasing with each request. This has been observed on Tcl 8.5 and 8.6 running on FreeBSD 9.1 and 10.1, and with 1.16 and 1.17 versions of tcllib.

Sample program will be attached in follow up.


request 0
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  1.6  0.0 54200 8832  1  S+    3:48PM 0:00.19 /usr/local/bin/tclsh8.6 ./leak.tcl
request 1
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  1.2  0.0 54200 8968  1  S+    3:48PM 0:00.20 /usr/local/bin/tclsh8.6 ./leak.tcl
request 2
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  1.1  0.0 54200 8976  1  S+    3:48PM 0:00.21 /usr/local/bin/tclsh8.6 ./leak.tcl
request 3
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  1.1  0.0 54200 8976  1  S+    3:48PM 0:00.22 /usr/local/bin/tclsh8.6 ./leak.tcl
request 4
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.8  0.0 54200 8980  1  S+    3:48PM 0:00.23 /usr/local/bin/tclsh8.6 ./leak.tcl
request 5
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.7  0.0 54200 8984  1  S+    3:48PM 0:00.24 /usr/local/bin/tclsh8.6 ./leak.tcl
request 6
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.7  0.0 54200 8992  1  S+    3:48PM 0:00.26 /usr/local/bin/tclsh8.6 ./leak.tcl
request 7
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.6  0.0 54200 8992  1  S+    3:48PM 0:00.27 /usr/local/bin/tclsh8.6 ./leak.tcl
request 8
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.6  0.0 54200 8992  1  S+    3:48PM 0:00.28 /usr/local/bin/tclsh8.6 ./leak.tcl
request 9
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.5  0.0 54200 8996  1  S+    3:48PM 0:00.29 /usr/local/bin/tclsh8.6 ./leak.tcl
request 10
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.6  0.0 54200 9008  1  S+    3:48PM 0:00.30 /usr/local/bin/tclsh8.6 ./leak.tcl
request 11
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.6  0.0 54200 9012  1  S+    3:48PM 0:00.32 /usr/local/bin/tclsh8.6 ./leak.tcl
request 12
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.5  0.0 54200 9136  1  S+    3:48PM 0:00.33 /usr/local/bin/tclsh8.6 ./leak.tcl
request 13
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.5  0.0 54200 9140  1  S+    3:48PM 0:00.34 /usr/local/bin/tclsh8.6 ./leak.tcl
request 14
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.5  0.0 54200 9152  1  S+    3:48PM 0:00.35 /usr/local/bin/tclsh8.6 ./leak.tcl
request 15
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.4  0.0 54200 9152  1  S+    3:48PM 0:00.36 /usr/local/bin/tclsh8.6 ./leak.tcl
request 16
USER          PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND
jeff.lawson 11749  0.5  0.0 54200 9152  1  S+    3:48PM 0:00.37 /usr/local/bin/tclsh8.6 ./leak.tcl
User Comments: aku added on 2019-06-25 00:10:02:
Here is one thing I consider a bug in http, noticed when I aborted a run by killing the proxy:

CloseSocket takes 2 arguments, socket to close, and optional token.
Here it is called with the error message in place of the socket.
The problem is line 657 in the modified package, equivalent to line 611 in the original.

Interestingly enough, you cannot leave that field undefined. That causes Finish to crash when calling CloseSocket, because it assumes to have a socket. Setting it to the empty string works however.

---
::http::X1651: Finish ::http::X1651 {} 1 (#1s1c1)
::http::X1651: CloseSocket {couldn't open socket: connection refused} ::http::X1651 (#1s1c1)
Closing socket couldn't open socket: connection refused (no connection info)
Error: can not find channel named "couldn't open socket: connection refused"
::http::X1651: CloseSocket {couldn't open socket: connection refused} ::http::X1651 RETURN
::http::X1651: TIMEOUT cancel after#1650
::http::X1651: Finish ::http::X1651 {} 1 RETURN
---

aku added on 2019-06-24 23:51:22:
Resuna, Jeff.

I have now run the bigscanner using `init1` and `init2`, i.e. with and without a proxy. The proxy was `mitmdump`, see link in previous comment.

No leaks for me.
However, I am also NOT seeing the `can't read "state(sock)": no such element in array` in the bigscanner output. IOW it seems that I am not managing to trigger whatever makes it happen.

Packages:
- http 2.8.11
- Tcl 8.6.7
- Tcllib 1.18, autoproxy 1.5.3
- TclTLS 1.7.13

I suspect that it is something in the http package, where some callbacks are crossing their wires, i.e. maybe the timeout callback not properly canceled, or a timeout not properly removing fileevent and then code tries to use bad state.

I suspect that the autoproxy is only part of it in that it makes the mixup somehow more likely, maybe by perturbing event timing due to delays through the proxy.

I have tried to track things through the code of http and, well, it is getting to complex to hold everything in my mind.

To hopefully drive this forward I am now attached a copy of `http-2.8.11.tm` to this ticket which has _a lot more_ of calls to `http::Log`, to better trace the internal calls

Define a non-empty implementation of http::Log, and we should see a nice dump of the activity. And hopefully see what exactly happens when the wires cross each other.

aku added on 2019-06-24 21:43:42:

Tool for setting up a local proxy - https://mitmproxy.org/


aku added on 2019-06-24 19:37:22:
Bring this forward.

resuna added on 2016-06-28 17:42:36:
OK, I think I found the leak. The ghost variable is an HTTP connection token that hasn't bean cleaned up, probably because the cleanup code errored out on the unset sock array element.

There's a timeout in the Tcl HTTP library, that's feeding the token to http::finish. I'm guessing that something has already started cleaning up the token and had closed and unset the socket.

I don't know if the bug is in tcllib's autoproxy, or in tcl's http code ( https://www.tcl.tk/man/tcl8.6/TclCmd/http.htm )

See tcl/library/http.tcl around lines 205 and 868.

resuna added on 2016-06-28 14:30:17:
OK, I'm seeing something. I've extended Jeff's test code to track all the variables in all the namespaces. Occasionally, I'm seeing errors like this:

can't read "state(sock)": no such element in array
    while executing
"CloseSocket $state(sock) $token"
    (procedure "Finish" line 13)
    invoked from within
"Finish $token"
    (procedure "http::reset" line 7)
    invoked from within
"http::reset ::http::1101 timeout"
    ("after" script)
can't read "state(sock)": no such variable
    while executing
"eof $state(sock)"
    (procedure "http::Connect" line 6)
    invoked from within
"http::Connect ::http::1101 https {} /“

can't read "state(sock)": no such element in array
    while executing
"CloseSocket $state(sock) $token"
    (procedure "Finish" line 13)
    invoked from within
"Finish $token"
    (procedure "http::reset" line 7)
    invoked from within
"http::reset ::http::1124 timeout"
    ("after" script)
1124 can't read "state(sock)": no such variable
    while executing
"eof $state(sock)"
    (procedure "http::Connect" line 6)
    invoked from within
"http::Connect ::http::1124 https {} /“

After these errors, the instrumented code shows that "[info vars]" says variables ::http::1101 and ::http::1124 exist, but [info exists] and [array exists] both fail for these variable names. I'll attach the instrumented code in a followup.

resuna added on 2016-06-28 12:45:11:
I don't think the test code is actually demonstrating the leak. VSZ is not increasing, and RSS is increasing asymptotically which is typical of any long running program.

aku added on 2015-12-04 16:01:57:
Thank you. Ok, will have to search for HOWTO guides on apache/mod_proxy setup, my knowledge on these is zilch.

That said, executive decision: I do not see me getting this investigated, not speaking of fixed, within 2 weeks, when I go on vacation till new year, and I wanted to make a Tcllib release before that.  I will defer this into '16, with the remaining time in this year for setup of the investigation. (Not just apache, valgrind as well, given that openssl is in the mix.). With apologies. (I am slower these days, it seems, in doing things)

anonymous added on 2015-12-04 08:38:41:
You can use Apache httpd with the mod_proxy to enable CONNECT style proxying that you need for HTTPS connections.

Latest versions is not a requirement--we have have experienced it for more than 3 years over multiple combinations of tcl, tls, openssl, FreeBSD versions.

aku added on 2015-12-04 08:29:57:
Hm, so weird ...

Do you need both autoproxy and TLS for the leak ?
I.e. do you have a leak just with autoproxy ?
Or a leak just with TLS active ?
(Trying to reduce the amount of components)

Is this with the latest version of TLS ?
And build against what version of OpenSSL ?

If TLS is involved, well, sockspy is not good enough to be a proxy in that case, it does not handle TLS connections properly. In that case I would need some recommendations for some proper proxy app I could run locally.

anonymous added on 2015-12-04 08:01:38:
Just using http::geturl and http::cleanup alone does not have any problem for us. We have many scripts that do that and run continuously for months that way without growing.

It is only when we need to use autoproxy with HTTPS destinations does the memory leak become unacceptable and we need to make our script self-terminate and restart every few thousand requests to avoid impacting other users on this server.

aku added on 2015-12-04 07:43:35:
I just disabled autoproxy in your test script, and I see the memory still growing, somewhat.

The modded script simply does http::geturl and http::cleanup in a loop.

Can you check if you get similar results in your environment and setup as well ?

I sort of noticed because when I instrumented the local autoproxy code it became clear that no proxy filtering was actually activated (I have no env variables to pick up, and the configure after init does nothing)

With the above it seems that the leak is more likely in the http package itself.

When I actually force a proxy by setting host:port as part of the autoproxy::init call, i.e.

    autoproxy::init localhost:8080

then a handler is properly installed into http, namely "autoproxy::filter".

That proc is very linear, just checking the provided host for exclusion and returning the proxy config. I cannot really see anything there which would leak memory.

Given http involvement, mine is 2.7.11, coming with the 8.5.14

One possibility which might involve autoproxy, would be that activating proxy handling in the http package triggers code branches in it which are more leaky than the code executed without proxies.

aku added on 2015-12-04 07:03:44:
I can confirm the RSS growing for Linux as well, Tcl 8.5.14.
It seems to be much slower however.
And the TLS does not seem to be required for it either.
(As a side note, I used sockspy as the proxy)

I might have to run this through a memory-debug enhanced Tcl core.

aku added on 2015-10-23 21:26:17:
Thank you. Successfully retrieved and attached.

anonymous added on 2015-10-23 17:55:55:
Please see the public gist here for the attachment:
https://gist.github.com/bovine/fccfb2b8c546d5f64581

aku added on 2015-10-19 15:37:32:
Jeff, with apologies for the late-ness.
The attached file is reported to me as "0 bytes of binary data", IOW as empty.

As such I am unable to confirm your report yet.

Please attach again, or send it to me in mail, at akupries at shaw do ca

Attachments: