Tcl Source Code

View Ticket
Login
Ticket UUID: 3536888
Title: Locale guessing of msgcat fails on (some) Windows 7
Type: Bug Version: obsolete: 8.5.11
Submitter: oehhar Created on: 2012-06-21 12:41:10
Subsystem: 30. msgcat Package Assigned To: oehhar
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2012-07-02 15:58:11
Resolution: Fixed Closed By: nijtmans
    Closed on: 2012-07-02 08:43:06
Description:
I experienced on a german-swiss windows 7 computer, that :
<session on tcl 8.5.11>
% package require msgcat
1.4.4
% msgcat::mcpreferences
en_us en {}
</session>

(the last result should be "de_ch de {}")

The reason for that is, that the language is detected observing the
registry key:
  [HKEY_CURRENT_USER\Control Panel\International\Locale]
It contains a numerical Language ID (LCID).

Possible values are in the registry subtree:
  [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\MIME\Database\Rfc1766]

Extract:
0409 en_us <- this is set
0407 de      <- this might be acceptable
0807 de_ch <- this would be correct

The observed value is "0409" which is en_us.
The correct value would be "de_ch". Still acceptable is "de".

I found other reports, that this registry key is not reliable using
windows 7:
http://social.technet.microsoft.com/Forums/en/w7itproinstall/thread/8597901e-47a8-4457-a8dc-653a260808b3

following the links at the end
http://blogs.msdn.com/b/michkap/archive/2010/03/19/9980203.aspx
it is proposed, that:
- there is a new key (since Vista I think) which contain directly the
IETF language tag (which is close to the locale):
  [HKEY_CURRENT_USER\Control Panel\International\LocaleName]

In this case, this would help. The contained value "de-CH" is correct.

----

Anyway, the recommended method is to use the system call:
  GetSystemDefaultLCID()
and from Vista on:
  GetUserDefaultLocaleName()

---

Two possible methods to solve the issue:

1) extend msgcat to first look to LocaleName
   This is officially not supported but would solve in this case
   The registry entry "LocaleName" is seen as more reliable, as all modern APIs use this instead of CLID's.
   A patch for this case is attached.

2) extend tcl to return the current system locale by an api
   This requires binary code
User Comments: nijtmans added on 2012-07-02 15:58:11:

allow_comments - 0

nijtmans added on 2012-07-02 15:57:57:
and to core-8-5-branch

nijtmans added on 2012-07-02 15:43:06:

allow_comments - 1

Merged to trunk now.

oehhar added on 2012-07-02 15:01:58:
It is ok for me.

Thank you for the good work,
Harald

nijtmans added on 2012-06-30 02:28:46:
Harald,

Your patch and doc update committed to branch bug-3536888

Please verify that everything is OK

oehhar added on 2012-06-29 22:53:17:
As the manual page of msgcat is quite explicit about windows extraction, we may extend the paragraph to the new circumstances:

Current contents:
On Windows, if none of those environment variables is set, msgcat will attempt to extract locale information from the registry.

Proposed contents:
On Windows, if none of those environment variables is set, msgcat will attempt to extract locale information from the registry. From Windows Vista on, the RFC4747 locale name "lang-script-country-options" is transformed to the locale as "lang_country_script"  (Example: sr-Latn-CS -> sr_cs_latin). For Windows XP, the language id is transformed analoguously (Example: 0c1a -> sr_yu_cyrillic).

oehhar added on 2012-06-29 21:35:15:

File Added - 447329: msgcat-1.4.5.tm-script.patch

oehhar added on 2012-06-29 21:32:40:
Following
http://msdn.microsoft.com/en-us/library/windows/desktop/dd373814%28v=vs.85%29.aspx

the script parameter is now translated to a mdoifier: sr-Latn-CS -> sr_cs@latin

For instance, only two script values are supported:
Latn -> latin
Cyrl-> cyrillic
Others which were found:
???? -> modern (in msgcat lang id translation table: 0c0a es_ES@modern)
             I suppose, this is not a script
Hant -> ? (in RFC4646: zh-Hant (Chinese written using the Traditional Chinese script))
Hans -> ? (in RFC4646: zh-Hans (Chinese written using the Simplified Chinese script))

The attached path is against the current trunk to implement this feature.

-Harald

nijtmans added on 2012-06-29 20:45:33:
...Until there comes a system named "ReactOS", which uses .dll's,
or a system named "Winnux" which turns out to be a linux-derivate...

Please leave it as is! Maybe it doesn't look as nice,
but its the cheapest and most trustworthy compared
to the alternatives. My bikeshed is green ...... ;-)

dkf added on 2012-06-29 20:14:32:
Try [string match Win*] for that...

oehhar added on 2012-06-29 19:55:12:
Thanks for the update of the Wiki page.
Thus
  tcl_platform(os) eq "Windows NT"
or
  string equal -length 3 "Win" $::tcl_platform(os)
to also get "Win31" and "Windows 95"
would do the job ?

nijtmans added on 2012-06-29 19:45:35:
> Wiki page
> http://wiki.tcl.tk/1649
> still states, that cygwin tcl returns platform "windows", but this was
> changed as far as I know.
> Could someone add/correct this page in respect to Cygwin ?

Modified now. Since feburary, Cygwin started to base its
port on 'unix' while previsiously it was based on win32.
That's why all fields changed. I changed the cygwin
port to use the Win32 functions again, so now
most fields return the same as before, except
the tcl_platform(platform), which became "unix"

oehhar added on 2012-06-29 19:31:22:
I found, that the table to translate windows lang IDs to locales does sometimes use the modifier, which was not jet on my radar:
Example:
43 uz
0443 uz_UZ@latin
0843 uz_UZ@cyrillic

I suppose, that this information is contained in the LocaleName "script" part:

I have found:
http://msdn.microsoft.com/en-us/library/aa226765%28v=sql.80%29.aspx
which says:

uz-UZ-Cyrl Uzbek (Cyrillic) - Uzbekistan
uz-UZ-Latn Uzbek (Latin) - Uzbekistan

What puzzels me, is that
http://tools.ietf.org/html/rfc5646
uses this format:
uz-Cyril-UZ Uzbek (Cyrillic) - Uzbekistan
uz-Latn-UZ Uzbek (Latin) - Uzbekistan
which is what I have implemented. But I currently ignore the script field.

On my Windows Vista, those keys are not present in the registry:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\MIME\Database\Rfc1766]
A modifier is never used.

Well, quite funny all that.
Richard Suchenwirth would know...

dkf added on 2012-06-29 19:17:32:
Well, actually doing:

  if {"registry" in [package names]} {...

would work since we're after the first [package require].

oehhar added on 2012-06-29 19:01:51:
Thank you for the discussion.

My point is not speed but clearty.
I don't like to do a "package require" in cases where I know it will fail and thus I will pollute the ::errorInfo without real error.

I don't like [info sharedlibextension] to much, as we use a side effect to detect the windows platform.
Is there no other way, to detect the windows platform ?
If not, I would love to have this in the platform package.
Thats why I asked Jan to add a comment and thats what he did.

Wiki page
http://wiki.tcl.tk/1649
still states, that cygwin tcl returns platform "windows", but this was changed as far as I know.
Could someone add/correct this page in respect to Cygwin ?

Further development of msgcat: I neither like, that the locale search in Init throws an error to report a non-matching locale. This may pollute ::errorInfo too.
IMHO this is ok for user code. Packages should not do that.

Just 2 cents,
Harald

nijtmans added on 2012-06-29 18:42:58:
Did a quick compare with the table in the registry:
   HKEY_LOCAL_MACHINE\SOFTWARE\Classes\MIME\Database\Rfc1766
found 3 locales which were missing, so added them now.

>Basing off a [package require registry] would be acceptable to me
A [package require] is expensive when the package is not found, as
it has to traverse all possible directories. So I fully agree with Harald.
[info sharedlibextension] is the cheapest way I know of.

oehhar added on 2012-06-29 18:38:55:
I personally don't like to do a
if {[catch {package require registry}]} { ...}
as this throws systematically an error on non-windows platforms.
This will pollute the ::errorInfo variable which is IMHO not good practice for a core package.

dkf added on 2012-06-29 17:52:56:
Basing off a [package require registry] would be acceptable to me; that is a package that is definitively not available on non-Windows platforms. The cost of populating the package database will have already been borne too; this is *inside* a package in the first place, so it will be possible to say definitively whether the package is there or not with fairly low cost.

nijtmans added on 2012-06-29 16:32:11:
merged to core-8-5-branch and trunk

oehhar added on 2012-06-29 16:24:59:
Please do it.

I am not ready jet, no login etc...

nijtmans added on 2012-06-29 16:16:52:
So, will you do the merge, or do you prefer
that I do it for you. (I don't know how familiar
you are already with fossil, this would be
a good test)

oehhar added on 2012-06-29 16:10:57:
Both points ok !
Thanks,
Harald

nijtmans added on 2012-06-29 16:03:32:
> Or we could use
> package present registry
> instead
> [info sharedlibextension] ne ".dll"

That wouldn't work: If "registry" is available
but not loaded yet, this would resturn false.

> - Also I would prefer the registry methods over the environ variable method
> on windows, as:
> - LANG is sometimes set, for example by an installed CYGWIN
> - LANG is normally less detailed - the country is missing while the
> registry method extracts the country.

By default, LANG is not set in CYGWIN unless the user
explicitely sets it. So, this is usefull as a way to override
the registry setting. I wouldn't change that.

oehhar added on 2012-06-29 15:43:45:
Or we could use
    package present registry
instead 
    [info sharedlibextension] ne ".dll"

This would clearly indicate what is required, the registry package.

----
Additional future thoughts:
- We could replace the fix translation table CLID->locale by registry access, as this table is contained in the registry:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\MIME\Database\Rfc1766]

I have to check, if this table is available on Win XP, it is ok since Vista.

I would only do this on tcl 8.6, as it might include some incompatibilities.
So thats another story...

- Also I would prefer the registry methods over the environ variable method on windows, as:
- LANG is sometimes set, for example by an installed CYGWIN
- LANG is normally less detailed - the country is missing while the registry method extracts the country.

nijtmans added on 2012-06-29 15:22:23:
I agree with your comments. The registry package works on
Cygwin as on Windows, only $tcl_platform(platform) is "unix",
so msgcat didn't try to use the registry package. The easiest
way to say "Windows or Cygwin" is [info sharedlibextension],
because those two platforms are the only ones using ".dll"
So, comments adapted acoording to that.

Suggested optimization is OK to me too.

Now updated in bug-3536888 branch.

oehhar added on 2012-06-29 14:54:12:

File Added - 447309: msgcat-1.4.5.tm.patch

oehhar added on 2012-06-29 14:53:22:
Dear Jan,
thank you for the two modifications.
"key" was lost somewhere...
Test is succesfull.

Two remarks:
1) for me, the purpose of '[info sharedlibextension] ne ".dll"' is not obvious.
I would like a comment like: "on windows but not cygwin"
Or I would like a more explicit "windows and not cygwin" check using the platform array/packages

What exactly happens on cygwin ? Is there no registry package ?

2) Small optimisation:
Replace:
#
    # The rest of this routine is special processing for Windows;
    # all other platforms, get out now.
    #
    if {[info sharedlibextension] ne ".dll"} {
mclocale C
return
    }
    #
    # On Windows, try to set locale depending on registry settings,
    # or fall back on locale of "C".
    #
    if {[catch {
package require registry
    }]} {
mclocale C
return
    }
by:
    # The rest of this routine is special processing for Windows;
    # all other platforms, get out now.
    #
    if {[info sharedlibextension] ne ".dll"
|| [catch {package require registry}]
    } {
mclocale C
return
    }
    #
    # On Windows, try to set locale depending on registry settings,
    # or fall back on locale of "C".
-END-

Thank you,
Harald

nijtmans added on 2012-06-29 03:07:43:
Fixed two things in the bug-3536888 branch:
- the variable "key" was used before defined
- didn't work on cygwin, now it does.

Harald, I think it's ready to be merged to
core-8-5-branch. Do you agree?
Please test it on your german-swiss windows 7
machine. To me everything looks fine now.

nijtmans added on 2012-06-22 20:40:55:
committed to branch bug-3536888

oehhar added on 2012-06-21 19:41:11:

File Added - 446744: msgcat-1.4.4.patch

Attachments: