Tcl Source Code

View Ticket
Login
Ticket UUID: 3362446
Title: registry keys command fails with 8.5/8.6
Type: Bug Version: obsolete: 8.5.11
Submitter: apnadkarni Created on: 2011-07-11 04:35:05
Subsystem: 32. registry Package Assigned To: nijtmans
Priority: 7 High Severity:
Status: Closed Last Modified: 2012-07-12 02:43:14
Resolution: Fixed Closed By: nijtmans
    Closed on: 2012-07-11 19:43:14
Description:
Use of the [registry keys] command fails under some circumstances on 8.5/8.6. This failure was not present in 8.4

With 8.5
% package require Tcl
8.5.9
% package require registry
1.2.1
% registry keys {HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog}
unable to enumerate subkeys of "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog": More data is available.

With 8.6, identical failure
(tcl) 49 % package require Tcl
8.6b1.2
(tcl) 50 % package require registry
1.3
(tcl) 51 % registry keys {HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog}
unable to enumerate subkeys of "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog": More data is available.

With 8.4, this failure does not happen
(tools) 1 % package require Tcl
8.4
(tools) 2 % package require registry
1.1.3
(tools) 3 % registry keys {HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog}
Application {Internet Explorer} ODiag OSession Security System

Logging bug as placeholder, Will follow up with additional info/trace

/Ashok
User Comments: nijtmans added on 2012-07-12 02:43:14:

allow_comments - 1

nijtmans added on 2012-07-12 02:43:13:
Little addendum: Microsoft recommends to
keep track of the length of the buffer used in
RegQueryValueEx in a separate value. This
could result in undefined behavior when
getting HKEY_PERFORMANCE_DATA.

Now fixed this in core-8-5-branch and
trunk as well.

nijtmans added on 2012-06-21 16:52:05:
Fixed now in core-8-5-branch and trunk.

Not (yet) in core-8-4-branch because:
- the original bug was not there
- it can always be done later if desired
   (it's in the bug-3362446 branch)
- Maybe it's better to backport
  registry-1.2.2 to 8.4 as-is.

apnadkarni, Can you confirm that everything
is OK now in 8.5 and/or trunk? Then this issue
can be closed.

nijtmans added on 2012-06-21 14:04:11:
Yes, I plan to merge this up to Tcl 8.6, but I first wanted to be sure
that it was impossible to generate the the "unable to query key ..."
error. That was the only possible incompatibility that I saw.

All versions, from 8.4 on already use unicode API
when running on Windows 2000+, and the original
report mentions that the error happens in 8.5 as well.
So it has nothing to do with the conversion to UNICODE
(the only difference in 8.6 is that it is compiled in there,
while in <8.6 it is runtime switchable)

The main difference between 8.4 and 8.5 is that in 8.5
the dll can be unloaded. This doesn't explain why
the bug occurs on 8.5 and not on 8.4

apnadkarni added on 2012-06-21 12:07:15:
Jan, I'm a bit confused by your last comment about not merging into core-8-4-branch and up. Does that mean you don't plan on merging into 8.6 ? Note from my original report that this problem only shows up with 8.6 since that is where the change to Unicode calls was made.

/Ashok

nijtmans added on 2012-06-20 14:41:36:
The only reason I am hesitating to merge this to
core-8-4-branch and up, is because there is no test
case which produces the error message "unable to
query key ...".  Maybe it was impossible from
the start to get this error message ever: If the
OpenKey() call succeeded alreay (so we know that
the key exists) why would the following
RegQueryInfoKey fail? msdn doesn't say anything
about that, other than that RegQueryInfoKey can
produce a system error message. But in the Tcl
code, it seems impossible.

Added test-case 6.21 now, seems that very long
value names and values work fine.

apnadkarni added on 2012-06-20 09:13:23:
Seems reasonable to me.

nijtmans added on 2012-06-19 20:22:20:
See:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms724862%28v=vs.85%29.aspx

This recommends just calling RegEnumKeyEx  until the
function returns ERROR_NO_MORE_ITEMS. Further on,
it's not worth to query the buffersize, if the maximum length
is not more than 256. This eliminates the need for
RegQueryInfoKey altogether, and it eliminates the
possibility for race consitions.

This suggestion is commited to branch
bug-3362446 now. All test-cases pass.

Agreed with this solution?

nijtmans added on 2012-06-19 18:59:32:
> - There is buffer overflow, I believe, if the code is compiled without
> _UNICODE defined (this may not matter anymore). The call to
> Tcl_WinTCharToUtf should use sizeof(TCHAR), not sizeof(WCHAR) I believe.

Good catch! That should be fixed now in trunk.

apnadkarni added on 2011-07-12 09:08:40:
I dunno about serious but I hate it when operations fail for no apparent reason. The glob example I gave I think illustrates the point - you don't want glob failing because someone deletes/renames a file in the glob target. Of course, registry enum is likely to be far less common than globbing files but still...

There are two other calls - to delete a key (tree) and enumerate values. The first I don't think needs fixing because when deleting a tree, the caller must always expect failure modes caused by other processes. The latter case probably does need fixing (but not the same fix since the value names are not limited to just 256 bytes.

If you want, I can look at producing a patch for that as well.

/Ashok

mistachkin added on 2011-07-12 03:08:47:
If these race conditions are truly a serious problem, perhaps other parts of tclWinReg.c need to be rewritten as well?  For example, I see several other calls to RegQueryInfoKey.

apnadkarni added on 2011-07-11 18:46:09:
I won't get a hold of the failing machine again until tomorrow so I will try it then.

Also, winReg_v2.diff does not handle the race condition when one of the keys is deleted. In this case, the command currently simply fails, whereas it should return a valid key list (either including or exculding the deleted key). It is akin to [glob] failing because someone deleted a file at the same time.

Personal opinion, after making changes similar to yours, and then adding the handling of ERROR_NO_MORE_ITEMS to avoid the race, I thought it cleaner to rewrite the function, although I do understand the philosophy behind minimizing changes. Also, Tcl does MAX_PATH stack allocations in many places so a stack var of 257 saves unnecessary allocations.

It's of course your call! I'll let you know tomorrow once I get hold of the failing system. I don't see why it would not work

/Ashok

mistachkin added on 2011-07-11 17:19:05:

File Added - 417748: winReg_v2.diff

mistachkin added on 2011-07-11 17:18:45:

File Deleted - 417746:

mistachkin added on 2011-07-11 17:18:37:
Attaching a more robust patch that fixes the Tcl_Obj leak as well.

mistachkin added on 2011-07-11 16:45:38:
Thanks for the patch.  I'm wondering if something a bit more minimal would work in your case?  Could you try that attached patch and see if the issue persists?

mistachkin added on 2011-07-11 16:45:20:

File Added - 417746: winReg.diff

apnadkarni added on 2011-07-11 15:33:08:
(Sigh - SF seems to have thrown my previous patch update into a bitbucket somewhere, reattaching)

Added patch - getkeynames.c - replace GetKeyNames in tclWinReg.c with the corresponding function in attached getkeynames.c file. Tested registry.test against 8.6 head.

/Ashok

apnadkarni added on 2011-07-11 15:31:18:

File Added - 417741: getkeynames.c

apnadkarni added on 2011-07-11 15:29:56:
Patch is attached. Replaces the GetKeyName() function in win/tclWinReg.c. Tested against 8.6 head registry.test, all pass except 16 tests skipped (english/nonportable)

apnadkarni added on 2011-07-11 14:07:44:
Tracing through the code, the failure is in tclWinReg.c:GetKeyNames. The code calls RegQueryInfoKey to get number of keys and max length, allocates a buffer and iterates through the number of keys.

There are several problems in this function:

- race condition - the max length AND/OR number of keys can change during the iteration (if another thread/process modified/adds/deletes a key), causing the function to fail with either ERROR_MORE_DATA or ERROR_NO_MORE_ITEMS
- There is a memory leak in case of errors
- There is buffer overflow, I believe, if the code is compiled without _UNICODE defined (this may not matter anymore). The call to Tcl_WinTCharToUtf should use sizeof(TCHAR), not sizeof(WCHAR) I believe.

Ironially enough, none of the above is the cause of the observed failure. The apparent reason is that RegQueryInfoKey does not always return the correct information when the Unicode version is used and the original key was stored using a MBCS call (this is just scuttlebutt but seems to match the circumstantial evidence). So the max length assumed by the function is in fact too small. I verified this in Visual Studio - the "Internet Explorer" key is 17 chars, but the returned max by RegQueryInfoKey is only 12. This is really then an MS bug, but their position is that RegQueryInfoKey return values cannot be relied upon and the sample iterative method in the sdk should be followed.

Patch to follow...

/Ashok

Attachments: