Ticket UUID: | 1095909 | |||
Title: | Do not use readdir_r | |||
Type: | Bug | Version: | obsolete: 8.5a3 | |
Submitter: | jenglish | Created on: | 2005-01-04 17:44:19 | |
Subsystem: | 52. Portability Support | Assigned To: | jenglish | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2005-07-08 08:04:19 | |
Resolution: | Accepted | Closed By: | jenglish | |
Closed on: | 2005-01-09 19:31:40 | |||
Description: |
If Tcl is compiled with --enable-threads, it uses readdir_r() (or readdir64_r) instead of readdir(). This is unnecessary: according to the Single Unix Specification, "The pointer returned by readdir() points to data which may be overwritten by another call to readdir() on the same directory stream. This data is not overwritten by another call to readdir() on a different directory stream." In other words: readdir() is perfectly thread-safe as long as you do not share the same DIR pointer across different threads (which Tcl does not do). In addition, Tcl does not allocate the correct amount of buffer space for the path name (in fact, this is not even possible in general): the officially recommended mechanism is to set name_max = pathconf(dirname, _PC_NAME_MAX), then allocate a buffer of 'sizeof(struct dirent) + name_max + 1' bytes, since the maximum length of a filename (if there is a maximum) is filesystem-specific. This should prevent buffer overruns (assuming readdir_r() is implemented according to spec), but it can still truncate the filename -- on Linux, for example, it is documented that "[f]iles with name lengths longer than [...] _PC_NAME_MAX may exist in the given directory". Related issues: 1071701, 1001325, 723502. Attached patch (to follow) gets rid of the whole mess and just uses readdir(). | |||
User Comments: |
hobbs added on 2005-07-08 08:04:19:
Logged In: YES user_id=72656 Backported to 8.4 branch port 8.4.11. Found the incorrectness of the previous version's m4 screwed up AIX threaded builds, so correcting the whole use of readdir_r seemed best. jenglish added on 2005-01-10 02:31:41: File Added - 115152: tcl-readdir.patch jenglish added on 2005-01-10 02:31:40: Logged In: YES user_id=68433 Committed to CVS HEAD to get more widespread testing. Refreshed patch attached (just added some comments) in the unlikely event that it needs to be reverted. dkf added on 2005-01-05 07:41:18: Logged In: YES user_id=79902 IIRC, readdir64 is needed even though we never look at a field where it matters, because the OS call doesn't know that we won't look. :^/ The alternative (partial failure) isn't common in Unix system calls AIUI... As the author of some of the readdir swamp, if you want to clear some of the alligators out, be my guest. jenglish added on 2005-01-05 07:39:19: File Added - 114535: tcl-readdir.patch jenglish added on 2005-01-05 07:39:18: Logged In: YES user_id=68433 Patch attached; tested with CVS HEAD, on Mandrake Linux 10.0, --enable-symbols --enable-threads --disable-shared; no test failures. One more note: The SunOS man pages insist that readdir() is not MT-safe and should not be used in multithreaded applications. As near as I am able to determine, this is simply not true. The API itself has no thread-safety issues as long as the program does not use the same DIR pointer in multiple threads (i.e., it's no more MT-unsafe than, say, lseek() or write()). The SuS requires that independant calls to readdir() on distinct DIR pointers not interfere with one another; all of the readdir() implementations I've been able to find (glibc, various BSDs, IRIX) are thread-safe subject to the above constraint; according to Google the traditional implementation is MT-safe as well; and if you think about it, it would be more work to implement readdir() in a way that's *not* MT-safe (subject to the above constraint) than it would be to implement one that is. Considering the problems with readdir_r, both historical and inherent, using it in favor of the traditional readdir() seems quite strongly contraindicated. Let's lose the #ifdeffery here; it's doing more harm than good. jenglish added on 2005-01-05 03:49:47: Logged In: YES user_id=68433 More notes: It's not clear that the core needs to distinguish between readdir() and readdir64(), either. The only struct dirent field the core looks at is d_name, so LFS issues (like 64-bit inodes or file sizes) aren't applicable. OTOH, there may be systems where readdir() does not work at all on large files (GLIBC's readdir was buggy in this regard at one point, and might still be). jenglish added on 2005-01-05 03:28:06: Logged In: YES user_id=68433 More notes: In the current codebase, TclpReaddir() calls readdir/readdir_r/readdir64/readdir64_r, depending on -D flags determined at configure-time. TclpReaddir() appears in the internal unix-specific stubs table, but is not referenced directly in the current codebase. Instead, tclUnixFile.c and tclUnixFCmd.c call TclOSreaddir(), which is #defined as one of: TclpReaddir() (if TCL_THREADS defined), readdir64() (if TCL_THREADS undefined and HAVE_STRUCT_DIRENT64 defined), or readdir (if both symbols are undefined). In addition, unix/tclUnixPort.h has #define readdir(x) TclpReaddir(x) if TCL_THREADS is defined. The only place "readdir" appears literally in the current codebase is in the implementation of TclpReaddir (which must take care to #undefine it first). Proposed simplification: don't re-#define readdir at all, and only use TclOSreaddir to select readdir() / readdir64() (IOW, -DTCL_THREADS makes no difference). TclpReaddir still appears in the internal stubs table, but is never referenced (directly or indirectly) by anything else in the core. |
Attachments:
- tcl-readdir.patch [download] added by jenglish on 2005-01-10 02:31:41. [details]