Ticket UUID: | 2998307 | |||
Title: | Can't display UTF when charset is not passed by the server | |||
Type: | Bug | Version: | None | |
Submitter: | samaelszafran | Created on: | 2010-05-07 19:14:15 | |
Subsystem: | 29. http Package | Assigned To: | patthoyts | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2022-09-10 12:18:45 | |
Resolution: | Fixed | Closed By: | kjnash | |
Closed on: | 2022-09-10 12:18:45 | |||
Description: |
$ tclsh % pa re http 2.7.5 % namespace children ::http ::oo ::tcl % http::config -accept */* -proxyfilter http::ProxyRequired -proxyhost {} -proxyport {} -urlencoding utf-8 -useragent {Tcl http client package 2.7.5} % namespace children ::http ::oo ::tcl % set http::defaultCharset utf-8 utf-8 % set token [http::geturl http://bash.org.pl/rss] ::http::1 % upvar 0 $token state % set state(charset) utf-8 This url is a simple RSS feed using the utf-8 charset, but not passing it in the Content-Type header. It tells me that it actually is UTF, but it isn't. After that I try to display it using http::data, but it dispalys incorrectly - in this case, polish UTF chars look like they're readen like ISO. | |||
User Comments: |
kjnash added on 2022-09-10 12:08:55:
Fixed in commit [d150b47456], branch http-bugfixes-2022H2. New http::geturl option -guesstype allows detection of XML files and their encoding when the server supplies no content-type. By default this option is off. dkf added on 2014-01-05 16:14:22:
If they're read like ISO8859-1 but you know they're “really” UTF-8, anonymous added on 2014-01-03 14:59:06: Still seems to apply in the current version Most importantly, http seems to completely ignore -binary true for charset mangling. I always get mangled UTF-8 out of TclHTTP, while the same file downloads correctly using TclCurl and wget from the command line. Please fix. samaelszafran added on 2010-05-08 02:18:24: I just forgot to mention - I've got TCLLib compiled from FreeBSD 7.2-STABLE Ports. |