Tcl Source Code

View Ticket
Login
Ticket UUID: 3033307
Title: binary decode base64 breaks with trailing spaces
Type: Bug Version: obsolete: 8.6b1.1
Submitter: a_kovalenko Created on: 2010-07-23 00:54:42
Subsystem: 12. ByteArray Object Assigned To: dkf
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2012-11-20 19:17:04
Resolution: Fixed Closed By: dkf
    Closed on: 2012-11-20 12:17:04
Description:
binary decode base64 "[binary encode base64 u]" => u, as expected
binary decode base64 "[binary encode base64 u] " => empty 

Trailing space characters cause inner loop to run 4 times in "post-data-end" branch, increasing "cut" value each time, with 3 output bytes only. Thus the last byte of previous data is thrown away incorrectly.

Patch attached.

The code is still too liberal, accepting illegal base64. It would be better to borrow a correct, well-tested implementation from somewhere else.
User Comments: dkf added on 2012-11-20 19:17:04:

allow_comments - 1

Confirmed that the tests detect the problem, and that it was a real problem that the patch fixed. Applied to trunk. Thanks!

dkf added on 2012-11-20 18:56:42:
The tests seem reasonable, given the existing definition of [binary decode base64].

dgp added on 2012-11-16 03:25:54:
See branch bug-3033307 for the contributed patch from Andy Goth.
Includes comments and tests.

dgp added on 2012-11-16 02:38:42:
This sure sounds like something that ought to be fixed for 8.6.0,
or have a real good understanding why not.

dkf added on 2012-10-18 15:12:33:
@andy: Only people with the right project permissions (i.e., people to whom we can assign the issue) can attach files to issues they didn't create.

andygoth added on 2012-10-18 09:03:57:
I hit this problem in 8.6b3, and it had very serious consequences for my application, so I fixed it properly.

I can't figure out how to attach a file here, so I put it on the Tcl Wiki: http://wiki.tcl.tk/tcl8.6b3-base64decode.patch .  Somebody please move that file here as a real attachment and delete that wiki page.

This patch tidies up the semantics for the cut variable, and it does so by making the bsae64 decoder a bit more strict.  Only one or two ='s are allowed, and they must be at the end of the final block.  Therefore cut can only be nonzero at or after the final block, and it can only go up to two, since at most two bytes can be stripped from the output (there are only three output bytes per block).  But if there's whitespace following the final block, three is added to cut to strip the entire dummy block that the whitespace would otherwise generate.

This patch adds some cases to binary.test to confirm that it's working as designed.  It also adds a lot of comments to the decode function to explain what it's doing.

I'd make the code even stricter, specifically I'd require that the input be a multiple of four characters in length, but that breaks existing cases in the test suite.  Instead I added comments to document the current behavior regarding incomplete blocks.  With or without my patch applied, the behavior is:

- If the input length (not counting whitespace) is 4n+1, where n is an integer, the last character is ignored.
- If the input length is 4n+2, two ='s are assumed at end of input.
- If the input length is 4n+3, one = is assumed at end of input.
- If the input length is 4n, no massaging is necessary.

Without my patch, trailing whitespace causes the final output character to be dropped.

a_kovalenko added on 2010-07-23 18:10:38:
Clarification: I don't mean "whitespace" here when I say that it's too liberal. E.g. single-character string, "P", is illegal when it's the _whole_ base64 input, yet Tcl [binary decode] accepts it. "P===" is also illegal and also accepted. These "holes" may be fixed easily, too, but after that this function will look even more ugly. 

It seems that ill-defined semantics of the `cut' variable (does it count bytes or 6bit units?) _is_ the cause of an impressive amount of "special-case" checks within such a small function.

I believe that my patch at least provides correct decoding of _correct_ base64, so it may be useful as a palliative.

dkf added on 2010-07-23 16:03:32:
Trailing spaces are *legal* in base64 anyway; the encoding specifies that whitespace is insignificant

a_kovalenko added on 2010-07-23 07:54:42:

File Added - 380861: tcl-base64-decode.patch

Attachments: