Ticket UUID: | 879a0747bee593e2cd0f531fa723f5f5d5595527 | |||
Title: | Incomplete multibyte char @eof -> failed read | |||
Type: | Bug | Version: | 8.5.18 | |
Submitter: | erikleunissen | Created on: | 2015-04-19 15:11:20 | |
Subsystem: | 24. Channel Commands | Assigned To: | dgp | |
Priority: | 8 | Severity: | Severe | |
Status: | Pending | Last Modified: | 2015-09-06 14:31:38 | |
Resolution: | Fixed | Closed By: | nobody | |
Closed on: | 2015-04-24 20:17:29 | |||
Description: |
Tcl segfaults when reading the attached file "workfile". The supplied script "exercise.tcl" makes that happen for you. The segfaults occurs with Tcl releases 8.5.18 and 8.6.4, and not with 8.4.20. Platform info and full back trace have been appended. Aside: the file "workfile" is no clever concoction. It's a real-world file (with another name) that exists in a user-specific configuration directory for my KDE-desktop. Tcl stumbled across it, when traversing the filesystem in search for specific patterns. The file contains some invalid UTF-8 characters. If you know up front that this file is coming, then you probably want to open it in binary mode and all is well. Of course, Tcl ought not segfault when reading it in text mode. Sincerely, Erik Leunissen -- Platform and encoding settings: % parray tcl_platform tcl_platform(byteOrder) = littleEndian tcl_platform(machine) = x86_64 tcl_platform(os) = Linux tcl_platform(osVersion) = 3.11.6-4-desktop tcl_platform(platform) = unix tcl_platform(pointerSize) = 8 tcl_platform(user) = erik tcl_platform(wordSize) = 8 % encoding system utf-8 -- Full back trace with Tcl 8.5.18: see attached bt.txt | |||
User Comments: |
dgp added on 2015-04-25 01:54:37:
Well these are the dilemmas that arise when we don't allow END of file to actually mean it, not aren't they? The other "purpose of fleeting eof" is the old tcl-udp strategy that every packet on a channel is effectively its own file, and terminates with a fleeting eof. It's an ungodly mess, and the only semi-rational answer I can propose at the moment is to restore bugward compatibility with 8.5.15. So sometime soon let's draft the set of tests that demo the useful cases, run them against 8.5.15 to get the target results and hit them. Until then, what's on the trunk isn't any more "wrong" than any other interpretation. ferrieux added on 2015-04-24 22:22:37: Reopening because I think there's still something wrong: -assume we're doing a kind of "tail -f" (purpose of fleeting EOF) on a file that we opened with encoding utf-8 -assume another process slowly writes to it the subsequent two bytes \xC3 and \xA9 => our reader gets two subsequent characters "Ã" and "©" -replay the same scenario but with the two bytes already in, or written fast enough that the reader gets both in one shot => our reader gets the single char "é" So it seems that when the fleeting EOF happens in mid-multibyte, the truncated sequence is somehow mapped to a character. Instead, it should be stored in internal state, and no character should be returned until the sequence completes. Easier said than done, I'll admit. dgp added on 2015-04-24 20:17:29: More complete fix with tests committed to 8.5 and trunk branches. dgp added on 2015-04-21 18:24:46: The likely fix is now on branch bug-879a0747be. Holding back from merges until I get some proper tests in place. dgp added on 2015-04-21 17:34:34: This is a descendant of Bug 1462248. dgp added on 2015-04-21 02:14:08: Thanks for the good demo. The dgp-read-bytes branch introduced new handling of this case about a year back. At a first glance, it appears to me Tcl was getting this case wrong before that, and gets it wrong in a different way now. But more careful examination is in order to firm up those initial impressions. Thanks for the report. ferrieux added on 2015-04-20 18:12:07: OK, the specificity of the file is very simple: it ends in mid-utf8-char ;) So you can reproduce with a mush smaller file: set ff [open foo w] fconfigure $ff -translation binary puts -nonewline $ff "hello\xD7" close $ff set ff [open foo r] ;# here we inherit [encoding system]==utf-8 set x [read $ff] So I guess that the multibyte machinery somehow gets into the way of the varied EOF flags. That should (and will) be cured, though the circumstances of the discovery are a bit contorted (you're basically reading a binary file with an utf-8 channel encoding). ferrieux added on 2015-04-20 16:57:59: Oh, sorry, indeed my "debug" builds were only via selection of the appropriate CFLAGS=$(CFLAGS_DEBUG). I never noticed that this does *not* properly set NDEBUG ! Also, I was under the impression that assertions in Tcl were rather sanity checks that we'd rather not disable, implemented as vanilla "if...Tcl_Panic". True NDEBUG-controlled assertions had escaped my attention :P So now, yes, I do reproduce. Nothing to do with OS or environment. Now all that's left to decipher is why this specific file has trouble setting the "sticky eof" state. erikleunissen added on 2015-04-20 14:11:52: Uhmm, that was confusing. I meant: - The failing assertion is triggered only when Tcl was built without -DNDEBUG, which is the default for debug builds. Optimized builds, however, do define NDEBUG by default, which disables assertions. erikleunissen added on 2015-04-20 14:04:40: Several remarks: - Output of env attached. - The failing assertion is triggered only when built without -DNDEBUG, which is the default for debug builds, but which lacks by default for optimized builds. - To start delimiting the extent of this issue, I ran ./exercise on a completely different machine: % parray tcl_platform tcl_platform(byteOrder) = littleEndian tcl_platform(machine) = armv7l tcl_platform(os) = Linux tcl_platform(osVersion) = 3.14.33-ti-r50 tcl_platform(platform) = unix tcl_platform(pointerSize) = 4 tcl_platform(user) = root tcl_platform(wordSize) = 4 % set tcl_patchLevel 8.5.18 The segfault occurred likewise. -- ferrieux added on 2015-04-20 12:52:13: Nothing obviously suspect in the strace :/ I get the same 2 full reads + 1 short + 1 empty as you do: read(4, "JL\32\0\0\0+13717 at http://dot.kde.o"..., 4096) = 4096 read(4, "rg/Lokalize\" rel=\"nofollow\">Loka"..., 4096) = 4096 read(4, "class=\"taxonomy-image-links\"><im"..., 4096) = 2865 read(4, "", 1231) = 0 Two things: - please post environment (output from "env"), in case something hides in one of the locale-oriented variables . I have an [encoding system] also equal to "utf-8", but the devil may hide in the details. - the failing assertion is: assert(!GotFlag(statePtr, CHANNEL_EOF) || GotFlag(statePtr, CHANNEL_STICKY_EOF) || Tcl_InputBuffered((Tcl_Channel)chanPtr) == 0); Added in this commit: 2014-11-06 16:34 [16bdf667aa] Stop Tcl forcing EOF condition on channels to be permanent. It may be fleeting, and all parts of Tcl channel ecosystem have to deal with that. New assertions and tests to keep us on track.(user: dgp, tags: trunk) Since it is not immediately clear to me, I'll assign to Don for comment on all three parts (a comment in the code itself would be welcome too). (I assume that since we've read the whole file, the one which should evaluate to true is CHANNEL_STICKY_EOF, which is a recent addition best mastered by Don) I am still *very* interested to see how this can heisenbug depending on libc version, or environment, or whatever... erikleunissen added on 2015-04-20 09:56:05: B.t.w. With a copy of workfile at another location (on the same file system), but also with a copy on another file system[*], the issue remains the same. [*] /dev/sda15 /mnt/local/heavy-duty ext4 rw,relatime,data=ordered 0 0 erikleunissen added on 2015-04-20 09:46:54: Hmm, weird. Filesystem: /dev/sda7 /home ext3 rw,relatime,data=ordered 0 0 Strace output attached. Erik -- ferrieux added on 2015-04-19 21:07:43: Cannot repro, be it on 8.5.18, 8.6.4, or trunk; both debug and nondebug builds. System is an AMD64 Debian wheezy: % info patch 8.5.18 % parray tcl_platform tcl_platform(byteOrder) = littleEndian tcl_platform(machine) = x86_64 tcl_platform(os) = Linux tcl_platform(osVersion) = 3.2.0-4-amd64 tcl_platform(platform) = unix tcl_platform(pointerSize) = 8 tcl_platform(user) = alex tcl_platform(wordSize) = 8 Maybe something with the filesystem your workfile is stored on ? Mine is: /dev/disk/by-uuid/da66d2f0-ebbb-4c83-b374-dfe13a90fe5e on / type ext4 (rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered) Maybe a low-level I/O error at a specific spot ? Please give details on fs and attach an strace of the execution. |
Attachments:
- bt.txt [download] added by ferrieux on 2015-04-20 20:08:03. [details]
- env.txt [download] added by erikleunissen on 2015-04-20 13:56:57. [details]
- strace.out [download] added by erikleunissen on 2015-04-20 09:44:32. [details]
- exercise.tcl [download] added by erikleunissen on 2015-04-19 15:12:56. [details]
- workfile [download] added by erikleunissen on 2015-04-19 15:12:26. [details]