Tcl Source Code

View Ticket
Login
Ticket UUID: 1462248
Title: core dump on Linux while [read]ing file
Type: Bug Version: obsolete: 8.4.12
Submitter: matzek Created on: 2006-03-31 16:50:25
Subsystem: 25. Channel System Assigned To: andreas_kupries
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2006-04-05 07:20:40
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2006-04-05 00:20:40
Description:
Hi *,

I occassionally get a core dump from a Sig11 on Linux
(x86_64) while reading plain files. It seems to depend
on the content of the file.

The Tcl interpreter is built from unmodified sources
and does not contain any extensions. It was built using
gcc 4.0.2 with -m32 to get i386 compatible executables.

% parray tcl_platform
tcl_platform(byteOrder) = littleEndian
tcl_platform(machine)   = x86_64
tcl_platform(os)        = Linux
tcl_platform(osVersion) = 2.6.13-15.7-smp
tcl_platform(platform)  = unix
tcl_platform(user)      = makr
tcl_platform(wordSize)  = 4
% set tcl_patchLevel
8.4.12

As I just noticed, it also only happens if env(LANG) is
set to an UTF-8 encoding, e.g. de_DE.UTF-8 or en_US.UTF-8.

Having a file like the attached "crashme" file, I'll do
the following then ...

% set f [open crashme]
file3
% while {![eof $f]} {read $f 4096}
Segmentation fault (core dumped)

Backtrace:

#0  0x5561858c in memcpy () from /lib/tls/libc.so.6
(gdb) bt
#0  0x5561858c in memcpy () from /lib/tls/libc.so.6
#1  0x080a973b in ReadChars (statePtr=0x810c338,
objPtr=0x810cff0, charsToRead=74, offsetPtr=0xffffac18,
factorPtr=0xffffac14)
    at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4818
#2  0x080a91f3 in DoReadChars (chanPtr=0x8114510,
objPtr=0x810cff0, toRead=74, appendFlag=0) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4488
#3  0x080a90dd in Tcl_ReadChars (chan=0x8114510,
objPtr=0x810cff0, toRead=4096, appendFlag=0) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4410
#4  0x080aeaf6 in Tcl_ReadObjCmd (dummy=0x0,
interp=0x80fc4f0, objc=3, objv=0x80ff78c) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIOCmd.c:365
#5  0x080629e9 in TclEvalObjvInternal
(interp=0x80fc4f0, objc=3, objv=0x80ff78c, command=0x0,
length=0, flags=0) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclBasic.c:3085
#6  0x08090f25 in TclExecuteByteCode (interp=0x80fc4f0,
codePtr=0x810c3b8) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclExecute.c:1419
#7  0x08090106 in TclCompEvalObj (interp=0x80fc4f0,
objPtr=0x8105e58) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclExecute.c:981
#8  0x08063c7d in Tcl_EvalObjEx (interp=0x80fc4f0,
objPtr=0x8105e58, flags=131072) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclBasic.c:4049
#9  0x080a1f85 in Tcl_RecordAndEvalObj
(interp=0x80fc4f0, cmdPtr=0x8105e58, flags=131072) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclHistory.c:142
#10 0x0804ab09 in Tcl_Main (argc=-1, argv=0xffffb5c8,
appInitProc=0x804a461 <Tcl_AppInit>) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclMain.c:392
#11 0x0804a45a in main (argc=1, argv=0xffffb5c4) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/tclAppInit.c:90
(gdb) up
#1  0x080a973b in ReadChars (statePtr=0x810c338,
objPtr=0x810cff0, charsToRead=74, offsetPtr=0xffffac18,
factorPtr=0xffffac14)
    at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4818
4818            memcpy((VOID *) (nextPtr->buf +
nextPtr->nextRemoved), (VOID *) src,

I noticed that memcpy() got fed with srcLen=-22:

(gdb) print srcLen
$1 = -22
(gdb) print src
$2 = 0x812528e ""
(gdb) print nextPtr->buf
$3 = "\201�212\027"
(gdb) print nextPtr->nextRemoved
$4 = 38
(gdb) print nextPtr->buf + nextPtr->nextRemoved
$5 = 0x81252c6 ""

I also just noticed ActiveTcl's tclsh8.4 (8.4.11) does
also core dump...

% parray ::activestate::ActiveTcl
::activestate::ActiveTcl(arch)          = linux-ix86
::activestate::ActiveTcl(as,mode)       = normal
::activestate::ActiveTcl(build)         = 162119
::activestate::ActiveTcl(buildtime,fmt) = Tue Jul 19
10:40:31 AM PDT 2005
::activestate::ActiveTcl(buildtime,sec) = 1121794831
::activestate::ActiveTcl(maturity)      = final
::activestate::ActiveTcl(product)       = ActiveTcl
::activestate::ActiveTcl(release)       = 8.4.11.0
%  set f [open crashme]
file3
% while {![eof $f]} {read $f 4096}
Speicherzugriffsfehler (core dumped)

Althouth this core does not give much away, it
apparently crashes at the same position:

#0  0x556c158c in memcpy () from /lib/tls/libc.so.6
(gdb) bt
#0  0x556c158c in memcpy () from /lib/tls/libc.so.6
#1  0x555c535e in ReadChars () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#2  0x555c4fca in DoReadChars () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#3  0x555c4eb0 in Tcl_ReadChars () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#4  0x555c9582 in Tcl_ReadObjCmd () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#5  0x5558eb27 in TclEvalObjvInternal () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#6  0x555b2faa in TclExecuteByteCode () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#7  0x555b2475 in TclCompEvalObj () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#8  0x5558fbe7 in Tcl_EvalObjEx () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#9  0x555bf6f4 in Tcl_RecordAndEvalObj () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#10 0x555d3565 in Tcl_Main () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#11 0x080488aa in main ()

kind regards -- Matthias Kraft

PS: SF won't let me upload the file as it is 512 kB,
please download from here:
http://www.matkraft.de/files/crashme
User Comments: andreas_kupries added on 2006-04-05 07:20:40:
Logged In: YES 
user_id=75003

Fixes committed to both HEAD and 8.4 branch head.

andreas_kupries added on 2006-04-04 06:31:54:
Logged In: YES 
user_id=75003

Ok. I know where the problem is, and I have fix sitting in
my sandbox. I have no original, so no patches, and
committing the fix has to wait for SF get their act together
and CVS back running.

Story time:

(a) ReadChars is called with a buffer containing one byte.
    This buffer has no successor (end-of-input-queue EOIQ).
(b) The Tcl_ExternalToUtf in ReadChars signals a split
    multi-byte character. Because of EOIQ ReadChars signals
   'nothing read' and 'channel_need_more_data'
(c) The IO layer does its thing, reading more data.
    This detects EOF, and sets TCL_ENCODING_END.
(d) ReadChars is called again.
    It now has two buffers. One with split multi-byte at the
    end, a second with the remainder of the multi-byte char.
    It tries to convert the first buffer again.
    This suceeds!! because of the TCL_ENCODING_END.

    This is wrong. It should not have suceeded, but failed
    as before. Then causing ReadChars to copy the partial
    multi-byte char to the beginning of the now existing
    next buffer. And then try again, with the modified
    input queue.

In essence the TCL_ENCODING_END is handed to the
Tcl_ExternalToUtf too early. Yes, there is an EOF pending,
but right now we have two buffers in the queue, so the first
one cannot be the end.

There is a contributing bug hidden in the above description:
Tcl_ExternalToUtf, i.e. Tcl_Utf2UtfProc is accessing memory
behind the end of its input buffer if the last character is
the start of a multi-byte character and TEE is set.

And in this case the character found there was a valid
completion of the multi-byte header, so it consumed 2 bytes
and reported that, starting the upper layers to psiral out
of control. Here the memory layout comes into play, causing
the high sensitivity against any type of change, be it
different buffer sizes or even a switch to non-interactive
operation. In most cases the byte read is not a valid
completion of the multi-byte char, so only one byte is
consumed (as part of creating canonical utf-8 from the
non-canonical input) and that keeps the upper layers stable
and happy. The read information is bad, a multi-byte char
was broken, but no crash.

This bug has been fixed as well. A quick test with valgrind
confirmed this bad memory access btw. (using a slightly
modified make valgrind target).

The whole problem is likely present in 8.5 as well. Fixes
will done tomorrow. This took the whole day to fully trace
in its entirety.

andreas_kupries added on 2006-04-04 00:53:31:
Logged In: YES 
user_id=75003

Switching to debug build makes the crash vanish. No
surprise. Usual heisenbug behaviour for a memory smash,
debug changes the memory layout enough to let the smash run
into nothing, disabling it. This requires direct fprintf in
the code ... Hm, maybe not. Just --enable-symbols, but not
the mem_debug stuff, then maybe just tracing it in the debugger.

andreas_kupries added on 2006-04-04 00:43:31:
Logged In: YES 
user_id=75003

Ok, there is something odd going on here. I can repro the
crash only when using the script interactively. The moment I
go non-interactive things are OK. And replacing the
while-loop with a large read of the whole file makes it go
away as well. 

This general elusivity, i.e. vanishing at the slighest
change, points to a memory-smash somewhere. :(

Yep, very likely. Changing the size of blocks read by even
one character, up or down, causes the crash to vanish as
well ... changing the -buffersize in tandem with the #chars
read is not effective, the crash vanishes. Changing the
buffersize alone makes it vanish as well. Ah, one case is
different: -buffer 4095, read 4096 blocks ... This seems to
hang the tclsh instead of crashing it.

dgp added on 2006-04-04 00:17:03:
Logged In: YES 
user_id=80530


no crash for me using
the tip of either
development branch.
Tested on both Solaris
and Linux/Alpha.

Can anyone reproduce
the crash?

andreas_kupries added on 2006-04-04 00:15:17:
Logged In: YES 
user_id=75003

Thanks for your efforts ... They were partly for naught. I
am unable to repro the bug on my i386 machine using the
crashme3 file with crash.tcl script. I get some nice output
in the form

3/12288 >>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32
2/8192 >>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33
1/4096 >>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34


However, I can reproduce the seg.fault using the original
large crashme file and using the trivial code in your
initial report. Just one change was necessary, "fconfigure
$f -encoding utf-8". This was on a i386 machine, i.e 32bit,
tclsh 8.4.13 (not yet released, near CVS head essentially),
compiled with a gcc 2.95.3.

The only part which seems to be the same in both your and my
crash is the encoding, utf-8. Based on that my guess is that
 it is the utf8 code which gets into trouble for specific
byte-sequences in the input.

This is the next thing to check, using a debug build and
other instrumentation to get a look into the behaviour.

So, for now I can definitely confirm that there is crash,
even if the exact cause is not yet known.

matzek added on 2006-04-03 20:37:52:

File Added - 173239: crash.tcl

matzek added on 2006-04-03 20:34:59:

File Added - 173238: crashme3

Logged In: YES 
user_id=330806

Attaching a reduced example file. Although the behavior to
reproduce changed a little, the core dump still looks the same.

With this file the crash occurs on Linux x86_64, i386 and
s390x. With the full file the crash also occurs on AIX 5.2
and HP-UX 11i.

Will attach a script to reproduce the crash with this
example file...

matzek added on 2006-04-03 19:10:10:
Logged In: YES 
user_id=330806

If I understand the code correctly, the file crashme
contains random data. It is generated to have some file to
test procedures used for data conversion and unpacking archives.

I currently have no smaller file, I'll try to produce one,
but no luck so far.

Tried halving the search, first half, second half, a moving
window half the size - but it doesn't crash then.

% fconfigure $f
-blocking 1 -buffering full -buffersize 4096 -encoding utf-8
-eofchar {} -translation auto

andreas_kupries added on 2006-04-01 00:29:43:
Logged In: YES 
user_id=75003

Ideas regarding the crashme file. Compress it to see if the
result goes below the upload limit of SF. gzip, or bzip2.

Also, have you found smaller files exhibiting the problem ?
Maybe a halving search ? First half of crashme crashing it ?
Second half ? The half-size section centered on the middle
of the file ? If yes, we can try to divide further.

Given the reference to LANG I consider it interesting to
find out what system encoding is chosen by Tcl. Better, what
default encoding for the channel you open. Can you add a
'puts [fconfigure $f]' statement to your script before you
start reading ?

Attachments: