Tcl Source Code

View Ticket
Login
Ticket UUID: 2998307
Title: Can't display UTF when charset is not passed by the server
Type: Bug Version: None
Submitter: samaelszafran Created on: 2010-05-07 19:14:15
Subsystem: 29. http Package Assigned To: patthoyts
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2022-09-10 12:18:45
Resolution: Fixed Closed By: kjnash
    Closed on: 2022-09-10 12:18:45
Description:
$ tclsh
% pa re http
2.7.5
% namespace children
::http ::oo ::tcl
% http::config
-accept */* -proxyfilter http::ProxyRequired -proxyhost {} -proxyport {} -urlencoding utf-8 -useragent {Tcl http client package 2.7.5}
% namespace children
::http ::oo ::tcl
% set http::defaultCharset utf-8
utf-8
% set token [http::geturl http://bash.org.pl/rss]
::http::1
% upvar 0 $token state
% set state(charset)
utf-8

This url is a simple RSS feed using the utf-8 charset, but not passing it in the Content-Type header.
It tells me that it actually is UTF, but it isn't. After that I try to display it using http::data, but it dispalys incorrectly - in this case, polish UTF chars look like they're readen like ISO.
User Comments: kjnash added on 2022-09-10 12:08:55:
Fixed in commit [d150b47456], branch http-bugfixes-2022H2.

New http::geturl option -guesstype allows detection of XML files and their encoding when the server supplies no content-type.  By default this option is off.

dkf added on 2014-01-05 16:14:22:

If they're read like ISO8859-1 but you know they're “really” UTF-8, encoding convertfrom utf-8 will do the conversion.


anonymous added on 2014-01-03 14:59:06:
Still seems to apply in the current version

Most importantly, http seems to completely ignore -binary true for charset mangling. I always get mangled UTF-8 out of TclHTTP, while the same file downloads correctly using TclCurl and wget from the command line. Please fix.

samaelszafran added on 2010-05-08 02:18:24:
I just forgot to mention - I've got TCLLib compiled from FreeBSD 7.2-STABLE Ports.