Tcl Source Code

View Ticket
Login
Ticket UUID: 2888099
Title: [close] discards buffered data when no space left on device
Type: Bug Version: obsolete: 8.6b1.1
Submitter: flatworm Created on: 2009-10-28 17:06:40
Subsystem: 25. Channel System Assigned To: ferrieux
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2009-11-11 06:51:27
Resolution: Fixed Closed By: ferrieux
    Closed on: 2009-11-10 23:51:27
Description:
[close] performed on an opened file silently discards any buffered data written into it when there's no space left on the underlying device.
[flush] works in this case, but the documentation explicitly states that "all buffered output is flushed to the channel's output device, any buffered input is discarded".

Reproducible on Linux (Debian Lenny running 2.6.26 kernel) both with the current CVS HEAD and 8.5.3.

Below are the steps to reproduce the error on Linux and the printouts of sample Tcl sessions:

1. Create a file "/var/tmp/blockdev" 1 megabyte in size:

$ dd if=/dev/zero of=/var/tmp/blockdev bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.00824364 s, 127 MB/s

2. Create ext3 filesystem on it:

$ /sbin/mkfs.ext3 /var/tmp/blockdev
mke2fs 1.41.3 (12-Oct-2008)
/var/tmp/blockdev is not a block special device.
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
128 inodes, 1024 blocks
51 blocks (4.98%) reserved for the super user
First data block=1
Maximum filesystem blocks=1048576
1 block group
8192 blocks per group, 8192 fragments per group
128 inodes per group
[...]

3. Set "reserved blocks count" to 0 on the filesystem:

$ /sbin/tune2fs -r 0 /var/tmp/blockdev
tune2fs 1.41.3 (12-Oct-2008)
Setting reserved blocks count to 0

4. Learn the block size and the amount of free blocks on the resulting FS:

$ /sbin/dumpe2fs -h /var/tmp/blockdev |egrep -i 'free blocks|block size'
dumpe2fs 1.41.3 (12-Oct-2008)
Free blocks:              986
Block size:               1024

5. Go root, create a mount point "/tmp/blockdev" and mount our file onto it:

$ su
# mkdir /tmp/blockdev
# mount -o loop /var/tmp/blockdev /tmp/blockdev

6. Create a file which will occupy all the available space on the FS:

# dd if=/dev/zero of=/tmp/blockdev/blob bs=1024 count=986
dd: writing `/tmp/blockdev/blob': No space left on device
982+0 records in
981+0 records out
1004544 bytes (1.0 MB) copied, 0.0069901 s, 144 MB/s

7. Verify no free space is really left:

# /sbin/dumpe2fs -h /var/tmp/blockdev | grep -i 'free blocks'
dumpe2fs 1.41.3 (12-Oct-2008)
Free blocks:              0

8. Run Tcl, attempt to write 5 bytes to a fully buffered file channel, close the stream, see the data discarded:

# ~kostix/tcl86/bin/tclsh8.6
% set fd [open /tmp/blockdev/foo w]
file5
% chan config $fd -buffering full
% puts $fd data
% unset -nocomplain errorCode errorInfo
% close $fd
% set errorCode
can't read "errorCode": no such variable
% set errorInfo
can't read "errorCode": no such variable
    while executing
"set errorCode"

9. Same as before, but do [flush] and see it complain as expected:

# ~kostix/tcl86/bin/tclsh8.6
% set fd [open /tmp/blockdev/foo w]
file5
% chan config $fd -buffering full
% puts $fd data
% unset -nocomplain errorCode errorInfo
% flush $fd
error flushing "file5": no space left on device
% set errorInfo
error flushing "file5": no space left on device
    while executing
"flush $fd"
% set errorCode
POSIX ENOSPC {no space left on device}
%
User Comments: ferrieux added on 2009-11-11 06:51:21:
Thanks. Committed to HEAD.

andreas_kupries added on 2009-11-11 05:22:05:
Looks good. +1 for checking with flatworm and commit.
Simulation: KevinK claims it can be simulated with a virtual filesystem alone. I am skeptic and believe that a refchan is needed as well. However, yes, in that case the error is pushed up through the bypass area, and not through errno, making the code path different.

In the long term it would IMHO make sense to have test channel drivers in the tcl*Test.c files which can force various error conditions in specific places.

When you commit and close this bug please re-assign it to yourself to reflect who did the most work and the closing.

ferrieux added on 2009-11-11 05:11:40:
Thanks. Tcl_GetErrno() version just uploaded.
Yes, it passes the test suite.
It seems very hard to simulate, esp. on Windows (need to install a nonstandard package). What about a refchan/memchan ? (but possibly it won't be a true errno-based error)...

ferrieux added on 2009-11-11 05:09:38:

File Added - 350358: enospc.patch

ferrieux added on 2009-11-11 05:09:05:

File Deleted - 350348:

andreas_kupries added on 2009-11-11 04:49:56:
Review ...
* Better use the documented 'Tcl_GetErrno()' instead of direct access to the 'errno' variable.
* Otherwise looks ok.
* Does it pass the testsuite ?
* Do we have a simulation of 'no space left' we can add as test case to the testsuite ?

ferrieux added on 2009-11-11 04:33:33:
Andreas, since you seem busy with more important things, I took the liberty to propose a patch. Please review, it is trivial, possibly incomplete.

ferrieux added on 2009-11-11 04:31:52:

File Added - 350348: enospc.patch

andreas_kupries added on 2009-11-10 04:30:54:
Ok, something after the FlushC inside of the WChars apparently smashes the 'errno'. I guess that it may be necessary to single-step to see where that happens exactly.Ok.

Note: Since a few weeks ago WriteChars(chanPtr, "", 0); should not generate actual output if the encoding doesn't actually provide characters to write, see ChangeLog entry 2009-10-23. This may have influence here as well.

ferrieux added on 2009-11-10 04:09:24:
I don't have a unix at hand right now to dig further, but let me just clarify:
You're probably right on my misinterpretation...however after single-stepping through the erroneous case I saw that the WriteBytes() in the snippet caused FlushChannel() to be called. So what happens after WriteBytes() in this snippet actually hides the ENOSPC which is already there...

andreas_kupries added on 2009-11-10 01:38:30:
You are misunderstanding the snippet. It looks for a bypass message, yes, and moves such a message up for use by the caller, yes. When there is no message in the bypass then the regular reporting of whatever is in errno should happen. The bypass stuff will happen only if it is used by a channel driver (see tclIORefChan/tclIORefTrans.c), and otherwise the regular error reporting is in effect.

Given what you are saying with regard to [fconfigure -translation binary] causing the reporting to happen as usual I believe that we should now look for the differences in code execution between the 'configure binary' and the regular case.

Also note that the snippet you refer too is run before 'FlushChannel'.

Or are you saying that the bypass are had a message, and is now overriding whatever came from 'FlushChannel' ?

Or is this now an artefact of trying to simulate the 'no space left' with a virtual filesystem and/or channel ?

ferrieux added on 2009-11-10 00:28:52:
I think it is in the code beginning at line 3042 of tclIO.c:

   /*
     * When the channel has an escape sequence driven encoding such as
     * iso2022, the terminated escape sequence must write to the buffer.
     */

[...]
WriteChars(chanPtr, "", 0);
[...]
if (statePtr->chanMsg != NULL) {
    if (interp != NULL) {
Tcl_SetChannelErrorInterp(interp,statePtr->chanMsg);
    }

As proof, just [fconfigure -translation binary]: the error "no space left on device" is reported again !

The crux of the problem is that the errorCode 28 (ENOSPC) is properly reported into the interpreter by FlushChannel, but in the snippet above, the detection is based on a bypass message. At this point there is no chanMsg, and all trace of an error is forgotten.
I'd appreciate Andreas' presence to confirm the diagnosis and prescribe the medicine.

dkf added on 2009-11-09 23:44:16:
I can't see where the error is getting lost. Channel close-down is really rather complex... :-(

dvrsn added on 2009-10-29 01:13:58:
ignore my comment, the script is getting killed by posix signal SIGXFSZ

flatworm added on 2009-10-29 01:06:59:
Reproduced on Windows XP for CVS HEAD:

C:\>dd if=/dev/zero of=G:\blob bs=1024 count=1024
rawwrite dd for windows version 0.5.
Written by John Newbigin <[email protected]>
This program is covered by the GPL.  See copying.txt for details
Error writing file: 112 Недостаточно места на диске
1001+0 records in
1000+0 records out

C:\>c:\opt\86d\bin\tclsh86g.exe
% set fd [open g:/foo w]
filea02c18
% chan config $fd -buffering full
% puts $fd data
% close $fd
% set fd [open g:/foo w]
filea07bc0
% chan config $fd -buffering full
% puts $fd data
% flush $fd
error flushing "filea07bc0": no space left on device
% set errorCode
POSIX ENOSPC {no space left on device}
% close $fd
% exit

dvrsn added on 2009-10-29 01:00:51:
Tcl script to write too big a file raises an uncatchable error.  This may be related.

set fd [open /tmp/somerandomfile w]

for {set c 64} {$c > 0} {incr c -1} {
    while {![catch {seek $fd [expr 1<<$c] current}]} {
        puts [tell $fd]
    }
}

seek $fd -20 current
puts stderr "write"
puts $fd "goodbye, world"

puts stderr "close"
catch {close $fd}

puts stderr "done"

flatworm added on 2009-10-29 00:35:23:
Relevant trailer from the strace output of the test case with the silent [close]:

open("/tmp/blockdev/foo", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 5
fcntl64(5, F_SETFD, FD_CLOEXEC)         = 0
ioctl(5, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfa60b18) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "file5\r\n"..., 7)             = 7
write(1, "% "..., 2)                    = 2
read(0, "chan config $fd -buffering full\n"..., 4096) = 32
write(1, "% "..., 2)                    = 2
read(0, "puts $fd test\n"..., 4096)     = 14
write(1, "% "..., 2)                    = 2
read(0, "close $fd\n"..., 4096)         = 10
write(5, "test\n"..., 5)                = -1 ENOSPC (No space left on device)
close(5)                                = 0
write(1, "% "..., 2)                    = 2
read(0, ""..., 4096)                    = 0
fcntl64(2, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(2, F_SETFL, O_RDWR)             = 0
fcntl64(1, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(1, F_SETFL, O_RDWR)             = 0
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(0, F_SETFL, O_RDWR)             = 0
write(4, "q"..., 1)                     = 1
close(4)                                = 0
futex(0x9eafbe8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x9eafc28, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x9eafc2c, FUTEX_WAIT_PRIVATE, 3, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0xb7a2dbd8, FUTEX_WAIT, 23837, NULL) = 0
futex(0x9eafbe8, FUTEX_WAKE_PRIVATE, 1) = 0
exit_group(0)                           = ?

flatworm added on 2009-10-29 00:30:18:
The following C program fails at the call to fclose() to demonstrate libc+kernel handle the situation correctly:

#include <stdio.h>
#include <string.h>

int main (int argc, char **argv)
{
        FILE *fd;
        char *test = "test\n";
        int len, written, res;

        if (argc != 2) {
                fprintf(stderr, "usage: %s FILENAME\n", argv[0]);
                return 1;
        }

        fd = fopen(argv[1], "a");
        if (fd == NULL) {
                perror("fopen failed");
                return 2;
        }

        len = strlen(test);
        written = fwrite(test, 1, len, fd);
        if (written < len) {
                perror("fwrite failed");
                return 3;
        }

        res = fclose(fd);
        if (res != 0) {
                perror("fclose failed");
                return 4;
        }

        return 0;
}

$ gcc -o mkfile -Wall -W -Wextra mkfile.c

then in the setting described above:

# /tmp/mkfile /tmp/blockdev/foo
fclose failed: No space left on device

Attachments: