Ticket UUID: | 2da8d6fb3d26031a8327b6bcda91f4fd158b71a8 | |||
Title: | http doesn't respect charset for some content-types | |||
Type: | Bug | Version: | 8.6.8 | |
Submitter: | andrew.brooks | Created on: | 2018-11-08 18:08:26 | |
Subsystem: | 29. http Package | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2020-12-31 15:10:15 | |
Resolution: | Duplicate | Closed By: | kjnash | |
Closed on: | 2020-12-31 15:10:15 | |||
Description: |
If a HTTP request made with the http package receives a reply with an explicit charset in the Content-Type header, the charset will only be respected when decoding the reply for certain Content-Types (text/* and several xml types). This behavior is at odds with RFC2068, section 3.7.1 ('HTTP/1.1 recipients MUST respect the charset label provided by the sender; and those user agents that have a provision to "guess" a charset MUST use the charset from the content-type field if they support that charset'). The problem is easy to reproduce: 1. Make a ::http::geturl request to a server that sends a reply with header 'Content-Type: application/json;charset=UTF-8' and a body containing a UTF-8 wide character. 2. Check the response body (the wide character will be mis-encoded). Note that this does not happen if the Content-Type header is 'text;charset=UTF-8' or 'application/xml;charset=UTF-8'. I first noticed this behavior in 8.6.8, but I imagine that it has been around for a while. I have attached a patch (written against trunk) for the http package that corrects the problem. | |||
User Comments: |
kjnash added on 2020-07-24 19:36:30:
The similar bug [13657a2dc35] was fixed in Tcl 8.6.10, http 2.9.1 by identifying Content-Type application/json as not binary. This fixes the present bug for me. If it fixes the problem for you I will close the ticket. andrew.brooks added on 2018-11-08 18:12:31: moved patch to an attachment |
Attachments:
- http.tcl.patch [download] added by andrew.brooks on 2018-11-08 18:11:15. [details]