Tcl Source Code

View Ticket
Login
Ticket UUID: 2da8d6fb3d26031a8327b6bcda91f4fd158b71a8
Title: http doesn't respect charset for some content-types
Type: Bug Version: 8.6.8
Submitter: andrew.brooks Created on: 2018-11-08 18:08:26
Subsystem: 29. http Package Assigned To: nobody
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2020-12-31 15:10:15
Resolution: Duplicate Closed By: kjnash
    Closed on: 2020-12-31 15:10:15
Description:
If a HTTP request made with the http package receives a reply with an explicit charset in the Content-Type header, the charset will only be respected when decoding the reply for certain Content-Types (text/* and several xml types).

This behavior is at odds with RFC2068, section 3.7.1 ('HTTP/1.1 recipients MUST respect the charset label provided by the sender; and those user agents that have a provision to "guess" a charset MUST use the charset from the content-type field if they support that charset').

The problem is easy to reproduce:

1. Make a ::http::geturl request to a server that sends a reply with header 'Content-Type: application/json;charset=UTF-8' and a body containing a UTF-8 wide character.
2. Check the response body (the wide character will be mis-encoded).

Note that this does not happen if the Content-Type header is 'text;charset=UTF-8' or 'application/xml;charset=UTF-8'.

I first noticed this behavior in 8.6.8, but I imagine that it has been around for a while.

I have attached a patch (written against trunk) for the http package that corrects the problem.
User Comments: kjnash added on 2020-07-24 19:36:30:
The similar bug [13657a2dc35] was fixed in Tcl 8.6.10, http 2.9.1 by identifying Content-Type application/json as not binary.

This fixes the present bug for me.  If it fixes the problem for you I will close the ticket.

andrew.brooks added on 2018-11-08 18:12:31:
moved patch to an attachment

Attachments: