Ticket UUID: | fb642c54bc58b31daafba9ae495ded4b0417d9bc | |||
Title: | Incorrect download of compressed encoded data | |||
Type: | Bug | Version: | 8.6.6 | |
Submitter: | gerhardr | Created on: | 2018-03-25 13:12:25 | |
Subsystem: | 29. http Package | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2022-09-11 16:43:20 | |
Resolution: | Fixed | Closed By: | kjnash | |
Closed on: | 2022-09-11 16:43:20 | |||
Description: |
Some compressd files are not completely downloaded. Test script (requires wget to compare the http downloaded file): ----------------------------------------------------- package require http # # 1st part from Tcl 8.6 manpage example # proc httpcopy { url file {chunk 4096} } { set out [open $file w] set token [::http::geturl $url -channel $out \ -progress httpCopyProgress -blocksize $chunk] close $out # This ends the line started by httpCopyProgress puts stderr "" upvar #0 $token state set max 0 foreach {name value} $state(meta) { if {[string length $name] > $max} { set max [string length $name] } if {[regexp -nocase ^location$ $name]} { # Handle URL redirects puts stderr "Location:$value" return [httpcopy [string trim $value] $file $chunk] } } incr max foreach {name value} $state(meta) { puts [format "%-*s %s" $max $name: $value] } return $token } proc httpCopyProgress {args} { puts -nonewline stderr . flush stderr } # # === Here starts my additional testing code === # if {[llength $argv]} { set url [lindex $argv 0] } else { set url "http://someonewhocares.org/hosts/hosts" } set org "outfile1.txt" set out "outfile2.txt" puts "Loading file $org from $url using wget" catch {exec wget $url -O $org} puts "Loading file $out from $url via httpcopy" httpcopy $url $out puts "HTTP file copy size is [file size $out], wget filesize is [file size $org]" ----------------------------------------------------- Output example $ tclsh test_download.tcl Loading file outfile1.txt from http://someonewhocares.org/hosts/hosts using wget Loading file outfile2.txt from http://someonewhocares.org/hosts/hosts via httpcopy ....... Date: Fri, 23 Mar 2018 21:23:21 GMT Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips content-disposition: attachment: filename=hosts cache-control: public, max-age=86400 Last-Modified: Thu, 22 Mar 2018 08:13:42 GMT Vary: Accept-Encoding Content-Encoding: gzip Connection: close Transfer-Encoding: chunked Content-Type: text/plain HTTP file copy size is 342964, wget filesize is 416015 | |||
User Comments: |
kjnash added on 2022-09-11 16:43:20:
(text/x-fossil-plain)
This ticket raises three separate issues: 1. [open $file wb] The "b" flag is equivalent to fconfigure $file -translation binary and in current http this is done automatically if the content-type is non-binary or if the stacked channel includes decompression - using the following code: if {$state(-binary) || [IsBinaryContentType $state(type)]} { # Turn off conversions for non-text data. set state(binary) 1 } if {[info exists state(-channel)]} { if {$state(binary) || [llength [ContentEncoding $token]]} { fconfigure $state(-channel) -translation binary } ... } 2. "Accept-Encoding identity" This is necessary to tell the server not to use compression. The http::geturl option -zip 0 should do this but does not - this bug is unrelated to this ticket (which always used the default -zip 1) and was fixed in commit [3cee774ebf] of branch http-bugfixes-2022H2. 3. using gzip The response is truncated if gzip is used. The error is the same as the truncation issue seen in tickets [3610253] for a chunked+gzip response written directly to a -channel. That bug is now fixed. sebres (claiming to be [email protected]) added on 2018-03-26 16:40:47: (text/x-fossil-wiki) Reopened because of <a href="https://groups.google.com/d/msg/comp.lang.tcl/7mNIrCZH2Ks/dbOyZcYABQAJ">https://groups.google.com/d/msg/comp.lang.tcl/7mNIrCZH2Ks/dbOyZcYABQAJ</a>: <pre> > ```diff > - set out [open $file w] > + set out [open $file wb] > ``` confirmed, download using: -headers {Accept-Encoding identity} brings with "set out [open $file wb]" the same result as wget. But without the geturl header {Accept-Encoding identity} the download file size differs by ca. 70kB. .... "HTTP file copy size is 342957, wget filesize is 416135" > Please reopen if I'm wrong. Don't know if this can be seen as expected behavior. The man page also explains the option -binary and even setting this to true and using filemode binary does not download the complete file. From my point of view it is really hard to find out why geturl does behave as it does. Maybe improving the man pages could also bring more clear picture or adding the gzip encoding to the manpage example. BTW: Nevertheless my problem is solved, many thanks to clt :-) </pre> Looks like some response headers seem to still not correctly impact, if "Accept-Encoding" is not "identity" (resp. "chunked" in my case).<br/> Currently, I've no time to dig deeper. sebres added on 2018-03-26 12:12:35: (text/x-fossil-wiki) You're trying to write file using current system encoding (which is UTF-8, I assume). So just change: <pre> - <b style="color:red">set out [open $file w]</b> + <b style="color:green">set out [open $file wb]</b> </pre> And you'll get it correctly. Note, this server does not provide the charset (just says text/plain in content-type, but it says nothing about what it is). So the target encoding is undefined, wget will use binary here (so writes as is). bll added on 2018-03-25 15:46:53: Relevant discussion at: https://groups.google.com/forum/#!topic/comp.lang.tcl/7mNIrCZH2Ks |
Attachments:
- test_download.tcl [download] added by gerhardr on 2018-03-27 12:35:54. [details]