Tcl Source Code

View Ticket
Login
Ticket UUID: db0a5f6417bcd5f8dbd37377daf1488b60b0c8f0
Title: Tests zlib-8.9 and zlib-8.11 fail with trunk [c7cea9d704] for MSWin 32/MSVC6
Type: Bug Version: trunk [c7cea9d704]
Submitter: oehhar Created on: 2016-06-09 13:36:07
Subsystem: 57. zlib Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2020-11-17 09:09:21
Resolution: Fixed Closed By: oehhar
    Closed on: 2020-11-17 09:09:21
Description:

I ran the test suite today and got the following new issues (compared to 8.6.5 release):

==== zlib-8.9 transformation and fconfigure FAILED
==== Contents of test case:

    zlib push compress $outSide -dictionary $spdyDict
    fconfigure $outSide -blocking 0 -translation binary -buffering none
    fconfigure $inSide -blocking 0 -translation binary
    puts -nonewline $outSide $spdyHeaders
    set result [fconfigure $outSide -checksum]
    chan pop $outSide
    $strm put -dictionary $spdyDict [read $inSide]
    lappend result [string length $spdyHeaders] [string length [$strm get]]

---- Result was:
3064818174 358 0
---- Result should have been (exact matching):
3064818174 358 358
==== zlib-8.9 FAILED

==== zlib-8.11 transformation and fconfigure FAILED
==== Contents of test case:

    zlib push deflate $outSide -dictionary $spdyDict
    fconfigure $outSide -blocking 0 -translation binary -buffering none
    fconfigure $inSide -blocking 0 -translation binary
    puts -nonewline $outSide $spdyHeaders
    chan pop $outSide
    $strm put -dictionary $spdyDict [read $inSide]
    list [string length $spdyHeaders] [string length [$strm get]]

---- Result was:
358 0
---- Result should have been (exact matching):
358 358
==== zlib-8.11 FAILED

System:

  • TCL Trunk [c7cea9d704]
  • Windows 10 64 bit GER host system
  • MS-VC6 with PSDK 2003 SP1
  • Makefile.vc without options

User Comments: oehhar added on 2020-11-17 09:09:21:

Fixed: https://wiki.tcl-lang.org/page/Test+Suite+Log


dkf added on 2016-07-26 10:06:27:

Strictly, the functionality is (probably) working fine. It's just that the bytes haven't been pushed into Tcl from the OS yet. As such, this is a weakness in the test and not a weakness in the code. As such, the way to test the buffering issue is quite different; it is testing something different. The point about testing what happens with short reads about is fair, but off to the side of this bug. I'll need to think about how exactly to test it, and I'll commit the fix first. I'll need to find a system to experiment with too, since I can't rely on Tcl's usual OS-difference hiding to make doing things on one platform adequate...


apnadkarni added on 2016-07-25 07:47:06:
No test failures for me too.

However, I do think specific channel transforms should include tests that verify correct non-blocking operation as their implementation may break it even though the underlying chan infrastructure operates correctly. Speaking from personal experience with TLS.

/Ashok

oehhar added on 2016-07-25 07:07:55:

Yes, [4d292bc19449bba4] makes the test failure disappear for me.

I am with you with the blocking and separation sentence. This is a good thing.

Thank you all, Harald


dkf added on 2016-07-24 12:49:36:

And while I understand the concern about blocking and non-blocking, that'd be reason to add further tests to the tests for chan, not for zlib. Let's keep things where they belong or we'll horribly confuse everyone in the future.


dkf added on 2016-07-24 12:47:01:

Does [4d292bc19449bba4] address this issue satisfactorily?


oehhar added on 2016-07-22 09:15:22:

Great that this long standing issue is aparrently solved. Cudos to the team !

The following might be off topic and not relevant here:

For me, non-blocking and blocking operations are two use cases and should have two tests. A non-blocking version should insert separate flush-commands.

At least in the "socket -async" part I remember, that I have added tests for both cases and wihtout event loop is perfectly valid.

See use-cases here: http://core.tcl.tk/tcl/info/db0a5f6417bcd5f8

aspect, why are your contributions always require a moderator? Please E-Mail Andreas Kupries to change this. Or shall I do it?

Thank you, Harald


aspect added on 2016-07-22 08:20:24:
Latest version runs perfectly on my Linux/x86_64 system with trunk [632e9e93f1] (same as my last update):

$ tclsh zlib-db0a5f6417-2.tcl 100000
Tcl: 8.7a0
ok=100000, fail=0

Differences between [chan pipe] buffering on Windows/Linux certainly deserve better documentation.  As noted by pooryorick in the chat, there's also discrepancy in manuals of [chan], [read] and others on the legality of using a nonblocking socket without the event loop .. this probably deserves another ticket and deep investigation to write a definitive spec.

apnadkarni added on 2016-07-22 03:25:24:
Changed my test snippet as below based on aspect's -blocking hint. This now runs on my Windows 8 system with no apparent failures over 50000 iterations. Not changed the actual tests themselves.

proc go {} {
    set spdyHeaders "HTTP/1.0 200 OK\r\nContent-Type: text/html; charset=utf-8\r\nX-Robots-Tag: noarchive\r\nLast-Modified: Tue, 05 Jun 2012 02:43:25 GMT\r\nETag: \"1338864205129|#public|0|en|||0\"\r\nExpires: Tue, 05 Jun 2012 16:17:11 GMT\r\nDate: Tue, 05 Jun 2012 16:17:06 GMT\r\nCache-Control: public, max-age=5\r\nX-Content-Type-Options: nosniff\r\nX-XSS-Protection: 1; mode=block\r\nServer: GSE\r\n"
    set spdyDict "optionsgetheadpostputdeletetraceacceptaccept-charsetaccept-encodingaccept-languageauthorizationexpectfromhostif-modified-sinceif-matchif-none-matchif-rangeif-unmodifiedsincemax-forwardsproxy-authorizationrangerefererteuser-agent100101200201202203204205206300301302303304305306307400401402403404405406407408409410411412413414415416417500501502503504505accept-rangesageetaglocationproxy-authenticatepublicretry-afterservervarywarningwww-authenticateallowcontent-basecontent-encodingcache-controlconnectiondatetrailertransfer-encodingupgradeviawarningcontent-languagecontent-lengthcontent-locationcontent-md5content-rangecontent-typeetagexpireslast-modifiedset-cookieMondayTuesdayWednesdayThursdayFridaySaturdaySundayJanFebMarAprMayJunJulAugSepOctNovDecchunkedtext/htmlimage/pngimage/jpgimage/gifapplication/xmlapplication/xhtmltext/plainpublicmax-agecharset=iso-8859-1utf-8gzipdeflateHTTP/1.1statusversionurl"
    lassign [chan pipe] inSide outSide
    set strm [zlib stream inflate]
    zlib push deflate $outSide -dictionary $spdyDict
    fconfigure $outSide -blocking 0 -translation binary -buffering none
    fconfigure $inSide -blocking 1 -translation binary
    puts -nonewline $outSide $spdyHeaders
    chan pop $outSide
    close $outSide
    $strm put -dictionary $spdyDict [read $inSide]
    set result [list [string length $spdyHeaders] [string length [$strm get]]]
    close $inSide
    $strm close
    return $result
}

if {[llength $argv]} {
    set fails 0
    set oks 0
    for {set i 0} {$i < [lindex $argv 0]} {incr i} {
        if {[go] eq "358 358"} {incr oks} else {incr fails}
    }
    puts "ok=$oks, fail=$fails"
}

apnadkarni added on 2016-07-21 17:19:38:
aspect, I bet you've hit the nail on the head. I'll try it tomorrow and see; too late in the day now but I've seen almost the same exact issue when debugging TLS but never noticed the -blocking 0 in this test. Don't quite know why it is there except maybe to not have the test suite hang on bugs.

/Ashok

aspect added on 2016-07-21 13:47:07:
adding [close $outSide; close $inSide] to the script to avoid exhausting file descriptors, I've run Ashok's test on linux x86_64 with trunk [632e9e93f1] for some millions of iterations and failed to see any result but {358 358}.

I find it unlikely that the inflate side is to blame but that should be easy to eliminate with below changes.  Separating deflate, push and pipe not so easy.

Random observation:  -blocking 0?  Could it be as simple as data that hasn't hit the read side of the socket yet?


--- zlib-db0a5f6417.tcl.orig    2016-07-21 22:59:53.336528919 +0930
+++ zlib-db0a5f6417.tcl 2016-07-21 22:59:35.126410995 +0930
@@ -8,8 +8,17 @@
     fconfigure $inSide -blocking 0 -translation binary
     puts -nonewline $outSide $spdyHeaders
     chan pop $outSide
-    $strm put -dictionary $spdyDict [read $inSide]
-    list [string length $spdyHeaders] [string length [$strm get]]
+    set zin [read $inSide]
+    $strm put -dictionary $spdyDict $zin
+    close $outSide
+    close $inSide
+    list [string length $zin] [string length $spdyHeaders] [string length [$strm get]]
 }
 
-time {puts [go]} 100
+proc t {} {
+    set gold [go]
+    while {[incr i] < 100000} {
+        if {[go] ne $gold} error
+    }
+}
+puts [time t 1]

apnadkarni added on 2016-07-21 07:33:36:
For testing, run the following proc in a loop at least a hundred times.

proc go {} {
    set spdyHeaders "HTTP/1.0 200 OK\r\nContent-Type: text/html; charset=utf-8\r\nX-Robots-Tag: noarchive\r\nLast-Modified: Tue, 05 Jun 2012 02:43:25 GMT\r\nETag: \"1338864205129|#public|0|en|||0\"\r\nExpires: Tue, 05 Jun 2012 16:17:11 GMT\r\nDate: Tue, 05 Jun 2012 16:17:06 GMT\r\nCache-Control: public, max-age=5\r\nX-Content-Type-Options: nosniff\r\nX-XSS-Protection: 1; mode=block\r\nServer: GSE\r\n"
    set spdyDict "optionsgetheadpostputdeletetraceacceptaccept-charsetaccept-encodingaccept-languageauthorizationexpectfromhostif-modified-sinceif-matchif-none-matchif-rangeif-unmodifiedsincemax-forwardsproxy-authorizationrangerefererteuser-agent100101200201202203204205206300301302303304305306307400401402403404405406407408409410411412413414415416417500501502503504505accept-rangesageetaglocationproxy-authenticatepublicretry-afterservervarywarningwww-authenticateallowcontent-basecontent-encodingcache-controlconnectiondatetrailertransfer-encodingupgradeviawarningcontent-languagecontent-lengthcontent-locationcontent-md5content-rangecontent-typeetagexpireslast-modifiedset-cookieMondayTuesdayWednesdayThursdayFridaySaturdaySundayJanFebMarAprMayJunJulAugSepOctNovDecchunkedtext/htmlimage/pngimage/jpgimage/gifapplication/xmlapplication/xhtmltext/plainpublicmax-agecharset=iso-8859-1utf-8gzipdeflateHTTP/1.1statusversionurl"
    lassign [chan pipe] inSide outSide
    set strm [zlib stream inflate]
    zlib push deflate $outSide -dictionary $spdyDict
    fconfigure $outSide -blocking 0 -translation binary -buffering none
    fconfigure $inSide -blocking 0 -translation binary
    puts -nonewline $outSide $spdyHeaders
    chan pop $outSide
    $strm put -dictionary $spdyDict [read $inSide]
    list [string length $spdyHeaders] [string length [$strm get]]
}

time {puts [go]} 100

apnadkarni added on 2016-07-21 07:30:51:
First some general comments. On my Win8 system (64-bit), the test runs for zlib-8.11 were FPPPPFPFFF (Fail/Pass). With a non-optimized debug build, using the code below extracted from the test, failures occur only about once in 20-25 runs. So the potential is there for this bug to exist on multiple platforms without actually being observed.

Anyways, the information I have so far shows

inflate() (line 1366 of tclZlib.c) returns 1 in the success case and 0xfffffffb (Z_BUF_ERROR) on failure. Line 1377 checks for this case but because listLen is 0, there is no more data to be passed in.

Not sure where to look next, whether the fault is on the input deflate side or on the output inflate.

dkf added on 2016-07-09 16:58:50:

I don't have anything like the right environment to even begin to try reproducing this (as trunk tip works for me with OSX). Maybe Jan does? He's at least on the right basic platform (though not using the same compiler or locale) and it would be good to get some of the configuration out of the required setup.

It looks like the problem is that the data is still in a zlib-internal buffer. Which might be due to a flushing problem, or a failure to handle a request to accept data into the Tcl-side buffer from zlib. It certainly has symptoms that remind me of a bug I've seen before...