Tk Source Code

View Ticket
Login
Ticket UUID: 10f2e7872bc0f8d81fa0f3596afeb0f4cf1191a1
Title: PNg writer produces invalid files
Type: Bug Version: 8.6.6, 8.6.5
Submitter: fpigorsch Created on: 2016-10-17 14:28:52
Subsystem: 41. Photo Images Assigned To: dkf
Priority: 9 Immediate Severity: Critical
Status: Closed Last Modified: 2016-10-30 05:27:03
Resolution: Duplicate Closed By: dkf
    Closed on: 2016-10-30 05:27:03
Description:

Starting in Tk 8.6.5, the following code snippet produces an invalid PNG file:

########
# create image (large dimensions seem to trigger the bug)
set width 2000
set height 5000
set img [image create photo -width $width -height $height]

# create row of random RGB pixels (a high number of different colors seems to be requirement for the bug)
set row {}
for {set x 0} {$x < $width} {incr x} {
    set c [format "#%02X%02X%02X" [expr int(rand() * 256)] [expr int(rand() * 256)] [expr int(rand() * 256)]]
    lappend row $c
}
set row [list $row]

# fill image with 'row'
for {set y 0} {$y < $height} {incr y} {
    $img put $row -to 0 $y $width [expr {$y + 1}]
}

# write png
set fname /tmp/invalid.png
$img write -format png $fname

exit
########
Loading /tmp/invalid.png with a libpng-based app (e.g. pngtopnm), results in several error messages, such as
...
libpng warning: Ignoring bad adaptive filter type
libpng warning: Ignoring bad adaptive filter type
pnmtopng:  fatal libpng error: Extra compressed data
pngtopnm: setjmp returns error condition

User Comments: dkf added on 2016-10-30 05:27:03:

The big lesson about the zlib stuff has been that it was a complicated API and should have had more cooking before going in. The disconnect between how we do APIs in Tcl/Tk and how the zlib library works is rather large, and the difference between what was delivered in original contributed source code and what was needed to satisfy the TIP was even larger.

I'm also concerned that we've still potentially got surprises lurking in the zlib code, but I think we've got them ironed out of the channel transform and the streaming command; it's the “one-shot” code that still worries me (though it hasn't failed yet) as it is full of magical rules of thumb for sizes of buffers. Though the failures if they come will be in data that simply fails to decompress and returns an error, so I'm not too worried there.


dkf added on 2016-10-30 05:15:55:

I can confirm that fixing the bug in Tcl fixes this issue, generating images that — when viewed using my system image viewer — contain pixels that go all the way to the bottom of the screen as opposed to having the trademark of a block of data just being omitted part way through.


dkf added on 2016-10-29 20:49:13:

See Tcl issue b26e38a3e4.


dgp added on 2016-10-27 15:01:18:
I'm very glad you've been able to narrow
down the cause, and I'm very glad I stopped
trying when I did.

It's troubling to me that zlib channel
compression has had so much trouble.
Any lessons we can learn and apply?

dkf added on 2016-10-24 09:52:44:

Tracked it down. This is a bug in Tcl that Tk just happened to expose. It was not handling the case where a buffer was full in the streaming compression code correctly. No idea why the adding in the alpha values was stopping the bug from being hit; I guess it was just making things compressible enough that the buffer problem wasn't hit? It's pretty difficult to hunt these things.


dkf added on 2016-10-22 14:18:18:

I've reproduced the problem. The bug appears to be a failure to write out a buffer of data part way through (in my case, the discontinuity happens at (1877, 2561), i.e., after 5121877 pixels or 15368192 bytes into the IDAT's uncompressed content; not sure about compressed, and there's an obvious absence of data from the end of the image). This now smells like a problem with handling of compression, either in how Tk calls the Tcl API or in how the Tcl streaming compression system is working internally.

It probably wasn't happening before because it was hitting the buffer handling slightly differently.


dkf added on 2016-10-21 14:23:22:

I'm guessing that the wrong number of bytes are being written in the loop in WriteIDAT() but I don't know why.


dgp added on 2016-10-18 16:01:19:
Any suggestions how to begin debugging this?

The PNG writer code is not written in a way
similar to how I think, and I fear breaking
it further with more random tinkering, and
precious few tests to contain us.

Any way to gain confidence whether the errors
are even in the PNG writer, or whether they're
manifestations of more troubles with zlib-encoded
channels?

fpigorsch added on 2016-10-18 07:02:21:
Can you suggest any workarounds to avoid the bug, except going back to Tk 8.6.4?
e.g. 
- removing/adding transparency, since the buggy changes address the alpha-channel (in our application, the image contents are under our control)
- reverting all changes to tkImgPNG.c, tkImgPhoto.c (we are deploying statically linked binaries, so modification of Tk is a viable option)?

Regards
Florian

dgp added on 2016-10-17 17:18:54:
Breakage appears to start with

http://core.tcl.tk/tk/info/d2fc5a3c29c75351