Ticket UUID: | fb642c54bc58b31daafba9ae495ded4b0417d9bc | |||
Title: | Incorrect download of compressed encoded data | |||
Type: | Bug | Version: | 8.6.6 | |
Submitter: | gerhardr | Created on: | 2018-03-25 13:12:25 | |
Subsystem: | 29. http Package | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2022-09-11 16:43:20 | |
Resolution: | Fixed | Closed By: | kjnash | |
Closed on: | 2022-09-11 16:43:20 | |||
Description: |
Some compressd files are not completely downloaded. Test script (requires wget to compare the http downloaded file): ----------------------------------------------------- package require http # # 1st part from Tcl 8.6 manpage example # proc httpcopy { url file {chunk 4096} } { set out [open $file w] set token [::http::geturl $url -channel $out \ -progress httpCopyProgress -blocksize $chunk] close $out # This ends the line started by httpCopyProgress puts stderr "" upvar #0 $token state set max 0 foreach {name value} $state(meta) { if {[string length $name] > $max} { set max [string length $name] } if {[regexp -nocase ^location$ $name]} { # Handle URL redirects puts stderr "Location:$value" return [httpcopy [string trim $value] $file $chunk] } } incr max foreach {name value} $state(meta) { puts [format "%-*s %s" $max $name: $value] } return $token } proc httpCopyProgress {args} { puts -nonewline stderr . flush stderr } # # === Here starts my additional testing code === # if {[llength $argv]} { set url [lindex $argv 0] } else { set url "http://someonewhocares.org/hosts/hosts" } set org "outfile1.txt" set out "outfile2.txt" puts "Loading file $org from $url using wget" catch {exec wget $url -O $org} puts "Loading file $out from $url via httpcopy" httpcopy $url $out puts "HTTP file copy size is [file size $out], wget filesize is [file size $org]" ----------------------------------------------------- Output example $ tclsh test_download.tcl Loading file outfile1.txt from http://someonewhocares.org/hosts/hosts using wget Loading file outfile2.txt from http://someonewhocares.org/hosts/hosts via httpcopy ....... Date: Fri, 23 Mar 2018 21:23:21 GMT Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips content-disposition: attachment: filename=hosts cache-control: public, max-age=86400 Last-Modified: Thu, 22 Mar 2018 08:13:42 GMT Vary: Accept-Encoding Content-Encoding: gzip Connection: close Transfer-Encoding: chunked Content-Type: text/plain HTTP file copy size is 342964, wget filesize is 416015 | |||
User Comments: |
kjnash added on 2022-09-11 16:43:20:
This ticket raises three separate issues: 1. [open $file wb] The "b" flag is equivalent to fconfigure $file -translation binary and in current http this is done automatically if the content-type is non-binary or if the stacked channel includes decompression - using the following code: if {$state(-binary) || [IsBinaryContentType $state(type)]} { # Turn off conversions for non-text data. set state(binary) 1 } if {[info exists state(-channel)]} { if {$state(binary) || [llength [ContentEncoding $token]]} { fconfigure $state(-channel) -translation binary } ... } 2. "Accept-Encoding identity" This is necessary to tell the server not to use compression. The http::geturl option -zip 0 should do this but does not - this bug is unrelated to this ticket (which always used the default -zip 1) and was fixed in commit [3cee774ebf] of branch http-bugfixes-2022H2. 3. using gzip The response is truncated if gzip is used. The error is the same as the truncation issue seen in tickets [3610253] for a chunked+gzip response written directly to a -channel. That bug is now fixed. sebres (claiming to be [email protected]) added on 2018-03-26 16:40:47: Reopened because of https://groups.google.com/d/msg/comp.lang.tcl/7mNIrCZH2Ks/dbOyZcYABQAJ: > ```diff > - set out [open $file w] > + set out [open $file wb] > ``` Looks like some response headers seem to still not correctly impact, if "Accept-Encoding" is not "identity" (resp. "chunked" in my case). sebres added on 2018-03-26 12:12:35: You're trying to write file using current system encoding (which is UTF-8, I assume). So just change: - set out [open $file w] + set out [open $file wb]And you'll get it correctly. Note, this server does not provide the charset (just says text/plain in content-type, but it says nothing about what it is). So the target encoding is undefined, wget will use binary here (so writes as is). bll added on 2018-03-25 15:46:53: Relevant discussion at: https://groups.google.com/forum/#!topic/comp.lang.tcl/7mNIrCZH2Ks |
Attachments:
- test_download.tcl [download] added by gerhardr on 2018-03-27 12:35:54. [details]