Tcl Source Code

View Ticket
Login
Ticket UUID: 879a0747bee593e2cd0f531fa723f5f5d5595527
Title: Incomplete multibyte char @eof -> failed read
Type: Bug Version: 8.5.18
Submitter: erikleunissen Created on: 2015-04-19 15:11:20
Subsystem: 24. Channel Commands Assigned To: dgp
Priority: 8 Severity: Severe
Status: Pending Last Modified: 2015-09-06 14:31:38
Resolution: Fixed Closed By: nobody
    Closed on: 2015-04-24 20:17:29
Description:
Tcl segfaults when reading the attached file "workfile".
The supplied script "exercise.tcl" makes that happen for you.

The segfaults occurs with Tcl releases 8.5.18 and 8.6.4, and not with 8.4.20.
Platform info and full back trace have been appended.

Aside: the file "workfile" is no clever concoction. It's a real-world file (with another name) that exists in a user-specific configuration directory for my KDE-desktop. Tcl stumbled across it, when traversing the filesystem in search for specific patterns. The file contains some invalid UTF-8 characters. If you know up front that this file is coming, then you probably want to open it in binary mode and all is well. Of course, Tcl ought not segfault when reading it in text mode.

Sincerely,

Erik Leunissen
--
Platform and encoding settings:
% parray tcl_platform
tcl_platform(byteOrder)   = littleEndian
tcl_platform(machine)     = x86_64
tcl_platform(os)          = Linux
tcl_platform(osVersion)   = 3.11.6-4-desktop
tcl_platform(platform)    = unix
tcl_platform(pointerSize) = 8
tcl_platform(user)        = erik
tcl_platform(wordSize)    = 8
% encoding system
utf-8
--

Full back trace with Tcl 8.5.18: see attached bt.txt
User Comments: dgp added on 2015-04-25 01:54:37:
Well these are the dilemmas that arise when we don't
allow END of file to actually mean it, not aren't they?

The other "purpose of fleeting eof" is the old tcl-udp
strategy that every packet on a channel is effectively 
its own file, and terminates with a fleeting eof.

It's an ungodly mess, and the only semi-rational answer
I can propose at the moment is to restore bugward
compatibility with 8.5.15.

So sometime soon let's draft the set of tests that demo
the useful cases, run them against 8.5.15 to get the
target results and hit them.

Until then, what's on the trunk isn't any more "wrong"
than any other interpretation.

ferrieux added on 2015-04-24 22:22:37:
Reopening because I think there's still something wrong:

 -assume we're doing a kind of "tail -f" (purpose of fleeting EOF) on a file that we opened with encoding utf-8

 -assume another process slowly writes to it the subsequent two bytes \xC3 and \xA9 

 => our reader gets two subsequent characters "Ã" and "©"

 -replay the same scenario but with the two bytes already in, or written fast enough that the reader gets both in one shot

 => our reader gets the single char "é"

So it seems that when the fleeting EOF happens in mid-multibyte, the truncated sequence is somehow mapped to a character. Instead, it should be stored in internal state, and no character should be returned until the sequence completes. Easier said than done, I'll admit.

dgp added on 2015-04-24 20:17:29:
More complete fix with tests committed to 8.5 and trunk branches.

dgp added on 2015-04-21 18:24:46:
The likely fix is now on branch bug-879a0747be.

Holding back from merges until I get some proper
tests in place.

dgp added on 2015-04-21 17:34:34:
This is a descendant of Bug 1462248.

dgp added on 2015-04-21 02:14:08:
Thanks for the good demo.

The dgp-read-bytes branch introduced new handling
of this case about a year back.

At a first glance, it appears to me Tcl was getting
this case wrong before that, and gets it wrong in
a different way now.

But more careful examination is in order to firm up
those initial impressions.  Thanks for the report.

ferrieux added on 2015-04-20 18:12:07:
OK, the specificity of the file is very simple: it ends in mid-utf8-char ;)
So you can reproduce with a mush smaller file:

 set ff [open foo w]
 fconfigure $ff -translation  binary
 puts -nonewline $ff "hello\xD7"
 close $ff
 set ff [open foo r] ;# here we inherit [encoding system]==utf-8
 set x [read $ff]

So I guess that the multibyte machinery somehow gets into the way of the varied EOF flags. That should (and will) be cured, though the circumstances of the discovery are a bit contorted (you're basically reading a binary file with an utf-8 channel encoding).

ferrieux added on 2015-04-20 16:57:59:
Oh, sorry, indeed my "debug" builds were only via selection of the appropriate CFLAGS=$(CFLAGS_DEBUG). I never noticed that this does *not* properly set NDEBUG !

Also, I was under the impression that assertions in Tcl were rather sanity checks that we'd rather not disable, implemented as vanilla "if...Tcl_Panic". True NDEBUG-controlled assertions had escaped my attention :P

So now, yes, I do reproduce. Nothing to do with OS or environment.
Now all that's left to decipher is why this specific file has trouble setting the "sticky eof" state.

erikleunissen added on 2015-04-20 14:11:52:
Uhmm, that was confusing. I meant:

- The failing assertion is triggered only when Tcl was built without -DNDEBUG, which is the default for debug builds. Optimized builds, however, do define NDEBUG by default, which disables assertions.

erikleunissen added on 2015-04-20 14:04:40:
Several remarks:

- Output of env attached.

- The failing assertion is triggered only when built without -DNDEBUG, which is the default for debug builds, but which lacks by default for optimized builds.

- To start delimiting the extent of this issue, I ran ./exercise on a completely different machine:
% parray tcl_platform
tcl_platform(byteOrder)   = littleEndian
tcl_platform(machine)     = armv7l
tcl_platform(os)          = Linux
tcl_platform(osVersion)   = 3.14.33-ti-r50
tcl_platform(platform)    = unix
tcl_platform(pointerSize) = 4
tcl_platform(user)        = root
tcl_platform(wordSize)    = 4
% set tcl_patchLevel
8.5.18

The segfault occurred likewise.
--

ferrieux added on 2015-04-20 12:52:13:
Nothing obviously suspect in the strace :/
I get the same 2 full reads + 1 short + 1 empty as you do:

 read(4, "JL\32\0\0\0+13717 at http://dot.kde.o"..., 4096) = 4096
 read(4, "rg/Lokalize\" rel=\"nofollow\">Loka"..., 4096) = 4096
 read(4, "class=\"taxonomy-image-links\"><im"..., 4096) = 2865
 read(4, "", 1231)                       = 0

Two things:

 - please post environment (output from "env"), in case something hides in one of the locale-oriented variables . I have an [encoding system] also equal to "utf-8", but the devil may hide in the details.

 - the failing assertion is:

	assert(!GotFlag(statePtr, CHANNEL_EOF)
		|| GotFlag(statePtr, CHANNEL_STICKY_EOF)
		|| Tcl_InputBuffered((Tcl_Channel)chanPtr) == 0);

Added in this commit:

 2014-11-06 16:34 [16bdf667aa] Stop Tcl forcing EOF condition on channels to be   
 permanent. It may be fleeting, and all parts of Tcl channel ecosystem have to 
 deal with that. New assertions and tests to keep us on track.(user: dgp, tags: 
 trunk)

Since it is not immediately clear to me, I'll assign to Don for comment on all three parts (a comment in the code itself would be welcome too).

(I assume that since we've read the whole file, the one which should evaluate to true is CHANNEL_STICKY_EOF, which is a recent addition best mastered by Don)

I am still *very* interested to see how this can heisenbug depending on libc version, or environment, or whatever...

erikleunissen added on 2015-04-20 09:56:05:
B.t.w.

With a copy of workfile at another location (on the same file system), but also with a copy on another file system[*], the issue remains the same.

[*] /dev/sda15 /mnt/local/heavy-duty ext4 rw,relatime,data=ordered 0 0

erikleunissen added on 2015-04-20 09:46:54:
Hmm, weird.

Filesystem: /dev/sda7 /home ext3 rw,relatime,data=ordered 0 0

Strace output attached.

Erik
--

ferrieux added on 2015-04-19 21:07:43:
Cannot repro, be it on 8.5.18, 8.6.4, or trunk; both debug and nondebug builds.
System is an AMD64 Debian wheezy:

 % info patch
 8.5.18
 % parray tcl_platform
 tcl_platform(byteOrder)   = littleEndian
 tcl_platform(machine)     = x86_64
 tcl_platform(os)          = Linux
 tcl_platform(osVersion)   = 3.2.0-4-amd64
 tcl_platform(platform)    = unix
 tcl_platform(pointerSize) = 8
 tcl_platform(user)        = alex
 tcl_platform(wordSize)    = 8

Maybe something with the filesystem your workfile is stored on ? Mine is:

 /dev/disk/by-uuid/da66d2f0-ebbb-4c83-b374-dfe13a90fe5e on / type ext4 (rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered)

Maybe a low-level I/O error at a specific spot ?
Please give details on fs and attach an strace of the execution.

Attachments: