TIP 173: Internationalisation and Refactoring of the 'clock' Command

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2017 Conference, Houston/TX, US, Oct 16-20
Send your abstracts to tclconference@googlegroups.com
by Aug 21.
Author:         Kevin Kenny <kennykb@acm.org>
State:          Final
Type:           Project
Vote:           Done
Created:        11-Mar-2004
Post-History:   
Discussions-To: news:comp.lang.tcl
Tcl-Version:    8.5

Abstract

The [clock] command provides Tcl's fundamental facilities for computing with dates and times. It has served Tcl faithfully since 7.6, but the computing world has advanced significantly in the decade that it has been in service. This TIP proposes a (nearly entirely compatible) reimplementation of [clock] that will allow for fewer ambiguities on input, improved localisation, more portability, and less exposure of platform-dependent bugs. A significantly greater fraction of [clock] shall be implemented in Tcl than it is today, and the code shall be refactored to use the ensemble mechanism introducted for Tcl 8.5 (see [112]).

Rationale

There is an embarrassing number of open bugs and feature requests against the [clock] command. As the maintainer of [clock], the author of this TIP has also received a number of informal feature requests that are not logged at SourceForge. Unfortunately, many of the requested fixes and enhancements cannot be effectively addressed with the current architecture of [clock].

  1. Several users have requested additional input formats to [clock scan], notably the full range of ISO8601 time formats (including formats based on week number and day-of-week); year and day-of-year; Apache "web log" dates and times; numeric dates placing the month before the day; and localised names of months and days of the week. Unfortunately, these formats simply cannot be added in the current architecture of [clock scan]; in fact, there are several outstanding bugs in [clock scan] (for example, the parsing of numeric time zones east of Greenwich) that cannot be fixed without breaking something else.

    The fundamental issue is that [clock scan] is asked to process input with too many ambiguities. An input token such as 2000, for example, may be interpreted as a year, a time of day, or a number ("now + 2000 seconds"). 1000 may (perhaps) not be a year, but could be a time of day, a number, or a time zone. Localisation would only make this problem worse. Without additional guidance, there is, even in theory, no way to determine whether 03-11-2004 represents the third of November or the eleventh of March.

    To solve this problem, a radical redesign of [clock scan] is required; the programmer must be allowed to specify an expected input format (or set of expected formats).

    A side effect of such a redesign would be improved ease of maintenance. The current [clock scan] is a YACC-derived parser; the build process, however, runs a script on the output of YACC to modify its memory management and alter its external symbol names to make it compatible with Tcl's conventions. This script is fragile; at present, it is known to work only with the version of YACC distributed with Solaris.

    There are a number of other issues with [clock scan] that could be addressed at the same time with such a redesign. For instance, there is a known problem at present that an input string that specifies time and time zone but not date can return a time that is one day too early or late; this problem arises because the existing parser presumes the current local date when parsing such a string, rather than the current date in the given time zone. The problem is difficult to address because of the left-to-right nature of the LALR(1) parser.

  2. A few enhancements have been requested to [clock format]; most notably, proper localization on all platforms. In addition, the documentation of [clock format] is at best approximate, because it depends on the strftime function in the Standard C Library. This function differs among platforms, because the C standard, the Posix standard, and the Single Unix Specification have gone through evolution over time, and few platforms support all the features of the current generation of any of them.

    In addition, the Year 2038 bug looms large on the horizon. On most 32-bit platforms, time_t (used in the C library funtions) is a 32-bit count of seconds from 1 January 1970; dates beyond 2038 cannot be represented in this format.

    The dependence on a complex library function such as strftime introduces obscure platform-dependent bugs. Several open bugs in [clock format], for instance, fail only on HP-UX, or only on Windows.

    Date formats have been requested (specifically, the Japanese civil calendar) that are beyond the capabilities of the Standard C Library functions.

    [clock format] does not honor user preferences for date/time format on Windows.

    All of these concerns seem to indicate that our current dependency upon vendor-supplied date and time manipulation routines is ill advised. A single implementation that we control will make the behavior consistent among platforms, allow the localisation to follow Tcl's conventions, and let us lead rather than follow the vendor in fixing bugs.

  3. Server applications frequently require support of multiple locales and multiple time zones within a single process, because they need to parse input and format output according to the client's environment. The current [clock] facilities either do not support localization at all, or else support a change to locale only by changing environment variables. This technique, once again, exposes bugs in the vendor libraries. It also introduces difficulties with thread safety; Tcl does not have a single mechanism whereby the TZ and LC_TIME environment variables are protected.

  4. The only mechanism for performing calculations like "one month after the current date" is [clock scan]. While this works well in practice, using a parser to perform arithmetic seems somewhat perverse.

Specification

The [clock] command shall be reimplemented as an ensemble [112], with most of the subcommands implemented in Tcl. A minimal set of the existing C code shall be refactored and placed inside a ::tcl::clock namespace. The existing subcommands seconds and clicks shall be exposed. The existing scan shall be hidden inside the namespace. [clock scan] and [clock format] shall be reimplemented in Tcl. In addition, a new [clock add] command shall be added.

The syntax and semantics of the [clock clicks] and [clock seconds] commands will remain unchanged.

clock scan

The [clock scan] command shall have the syntax:

clock scan string ?-base baseTime? ?-format format? ?-gmt boolean? ?-locale name? ?-timezone timeZone?

It accepts a character string representing a date and time and returns the time that the string represents, expressed as a count of seconds from the Posix epoch (1 January 1970, 0000 UTC).

If a -format option is not supplied, the scan is a free format scan. The existing YACC parser for clock scan will be used to interpret the input string. This form of the command is explicitly deprecated because of the inherent ambiguities in interpreting the input string. The free-format version of [clock scan] does not accept -locale or -timezone options, since the legacy code does not support multiple locales or time zones.

If the -format options is supplied, it is interpreted as a specification for the expected input form. If the given string matches the input form, it is converted to a count of seconds and returned; otherwise, an error is thrown. See FORMATS below for a discussion of the available format groups and their interpretation.

Extraction of the date from the input string is guided by what fields are present in the format. The order of preference, from highest to lowest, is:

{seconds from epoch}, {starDate}: Date fields that specify both date and time take highest precedence. If format groups for these fields appear multiple times, the rightmost takes precedence.

{Julian Day Number}: The Julian Day Number uniquely specifies a calendar date.

{century, year, month, day of month}, {century, year, day of year}, {century, year, week of year, day of week}, {locale era, locale year, month, day of month}: Formats with complete year are preferred to formats with a two-digit year. For a two digit year, the date range is constrained to lie between 1938 and 2037.

{year, month, day of month}, {year, day of year}, {year, week of year, day of week}, {year of locale era, month, day of month}: Formats that specify the year are preferred to those that do not.

{month, day of month}, {day of year}, {week of year, day of week}: Formats that specify a day within the year are preferred to those that specify merely the day of week or day of month. Formats that do not specify the year are presumed to designate the base year.

{day of month}, {day of week}: If none of the above rules apply, a day of the month or day of the week standing alone is interpreted as belonging to the base month or week.

None of the above: If no combination of fields that specifies a date is found, the base date is used.

The time of day returned by [clock scan] is determined by the presence of fields in the format string, in the following order of preference.

{seconds from epoch, StarDate}: If either of these fields is present, it uniquely determines date and time.

{am/pm indicator, hour am/pm, minute, second}, {hour, minute, second}: Time with seconds is preferred to time without seconds.

{am/pm indicator, hour am/pm, minute}, {hour, minute}: Time can be interpreted without the seconds.

{am/pm indicator, hour am/pm}, {hour}: Time can be expressed as an hour alone, e.g.,

 clock scan "6 pm" -format "%I %p"

None of the above: If none of the above indicators is present, 00:00:00 (the start of the day) in the given time zone is used.

In all of the foregoing discussion, the 'base date', 'base month', 'base week', and 'base year' refer to the day, month, week or year designated by the -base parameter, which is a count of seconds from the Posix epoch. If no -base parameter is supplied, the current date is used as the base date. The year, month, week and day are obtained by interpreting the base date in the time zone specified by the date/time string. If the given format does not include a time zone, then the base time is interpreted in the default time zone; see TIME ZONES below for the way that the default time zone is determined, and the interpretation of the -timezone and -gmt options.

The locale is used to determine the spelling of native language words such as the names of months, names of weekdays, am/pm indicators, and locale eras. It is also used in the interpretation of the format groups, '%X', '%x', and '%c'. In addition, the locale determines the date at which the calendar in use changes from the Julian calendar to the Gregorian. If no -locale parameter is supplied, the default is to use the root locale. See LOCALISATION below for more information.

clock format

The [clock format] command shall have the syntax:

clock format string ?-format format? ?-gmt boolean? ?-locale name? ?-timezone timeZone?

It accepts a time, expressed in seconds from the Posix epoch of 1 January 1970, 00:00 UTC, and formats it according to the given format string. See FORMATS below for a discussion of the available format codes. If no format string is supplied, a default format, {%a %b %d %H:%M:%S %Z %Y} is used.

The -timezone, -gmt, and -locale options are interpreted as for [clock scan]. See TIME ZONES and LOCALISATION below for how these options work.

clock add

This command performs arithmetic on dates and times. The syntax is:

clock add time ?count unit?... ?-gmt boolean? ?-timezone timeZone? ?-locale name?

It accepts a time, expressed in seconds from the Posix epoch of 1 January 1970, 00:00 UTC, and adds or subtracts units of time from it according to the alternating count and unit parameters. Each count must be a wide integer; each unit is one of the following:

 years   year    months  month
                 weeks   week    days    day
 hours   hour    minutes minute  seconds second

The command works by converting the given time to a calendar day and time of day in the given locale and time zone. To that day and time of day, it adds or subtracts the given offsets in sequence. It reconverts the resulting time to a count of seconds, again using the given locale and time zone, and returns that count of seconds.

There are subtle differences in many cases between adding seemingly similar offsets. For instance, on the day before Daylight Saving Time goes into effect, adding 24 hours will give "the time 24 hours from the base time, irrespective of any clock change", while adding 1 day will give "the time it will be at the same time of day on the following day." Similarly, adding 1 month on 30 January will give either 28 or 29 February. There are equally strange effects when performing date/time arithmetic across the change between the Julian and Gregorian calendars.

The -timezone, -gmt, and -locale options are used to control the interpretation of the count of seconds as a calendar day and time. Refer to TIME ZONES and LOCALIZATION below for a fuller discussion.

Formats

The [clock scan] and [clock format] commands will be implemented in Tcl, without depending on the local strftime and strptime functions. For this reason, format groups will function identically on all platforms. The format groups will be interpreted as follows.

%a: On output, receives the abbreviation for the day of the week in the given locale. On input, matches the name of the day of the week (in the given locale) in either abbreviated or full form, and may be used to determine the calendar date.

%A: On output, receives the full name of the day of the week in the given locale. On input, treated identically with %a.

%b: On output, receives the abbreviation for the name of the month in the given locale. On input, matches the name of the month (in the given locale) in either abbreviated or full form, and may be used to determine the calendar date.

%B: On output, receives the full name of the month in the given locale. On input, treated identically with %b.

%C: On output, receives the number of the century, in Indo-Arabic numerals. On input, matches one or two digits, and accepts the number of the century in Indo-Arabic numerals. May be used to determine the calendar date.

%c: On output, produces a correct locale-dependent representation of date and time of day. On input, matches whatever format %c produces in the given locale, and may be used to determine calendar date and time.

%d: On output, produces the number of the day of the month, in Indo-Arabic numerals, with a leading zero. On input, matches one or two digits, accepts the day of the month, and may be used to determine calendar date.

%D: Synonymous with %m/%d/%Y. Should be used only in US locales.

%e: On output, produces the number of the day of the month, in Indo-Arabic numerals, with no leading zero. On input, treated identically with %d.

%Ec: On output, produces a locale-dependent representation of date and time of day in the locale's alternative calendar. On input, matches whatever %Ec produces, and may be used to determine calendar date and time.

%EC: On output, produces the name of the current era in the locale's alternative calendar. On input, accepts the name of the era in the locale's alternative calendar, and may be used to determine calendar date.

%Ex: On output, produces the calendar date in a locale-dependent representation using the locale's alternative calendar and alternative numerals. On input, accepts whatever %Ex produces and may be used to determine calendar date.

%EX: On output, produces the time of day in the locale's alternative representation. On input, accepts whatever %EX produces and may be used to determine time of day.

%Ey: On output, produces the number of the current year relative to the locale's current era %EC, expressed in the locale's alternative numerals. On input, accepts the number of the year relative to the current era in the locale's alternative numerics, and may be used to determine calendar date.

%EY: On output, produces an unambiguous representation of the current year in the locale's alternative calendar and alternative numerals. This group is often synonymous with %EC%Ey. On input, accepts whatever %EY produces and may be used to determine calendar date.

%g: On output, produces the two-digit year number suitable for use with the ISO8601 week number. On input, accepts a two-digit year number, and may be used to determine calendar date if the %V format group is also present.

%G: On output, produces the four-digit year number suitable for use with the ISO8601 week number. On input, accepts a four-digit year number, and may be used to determine calendar date if the %V format group is also present.

%h: Synonymous with %b.

%H: On output, produces the two-digit hour of the day on a 24-hour clock (00-24). On input, matches two digits, and may be used to determine time of day.

%I: On output, produces the two-digit hour of the day on a 12-hour clock (12-11). On input, matches two digits, and may be used to determine time of day.

%j: On output, produces the three-digit number of the day of the year. On input, matches three digits, and may be used to determine the day of the year.

%J: On output, produces the number of the Julian Day Number beginning at noon of the given date. The Julian Day Number is a representation popular with astronomers; it is a count of days in which Day 1 is 1 January, 4713 B.C.E., on the proleptic Julian calendar; in this system, 1 January 2000 is Julian Day 2451545. On input, matches any string of digits and interprets it as a Julian Day; may be used to determine calendar date.

%k: On output, produces the number of the hour on a 24-hour clock (0-24) without a leading zero. On input, matches one or two digits and may be used to determine time of day.

%l: On output, produces the number of the hour on a 12-hour clock (12-11) without a leading zero. On input, matches one or two digits and may be used to determine time of day.

%m: On output, produces the number of the month (01-12), with exactly two digits (using a leading zero if necessary). On input, matches exactly two digits and may be used to determine calendar date.

%M: On output, produces the number of the minute of the hour (00-59) with exactly two digits (using a leading zero if necessary). On input, matches exactly two digits and may be used to determine time of day.

%N: On output, produces the number of the month, with no leading zero. On input, matches one or two digits, and may be used to determine time of day.

%Od, %Oe, %OH, %OI, %Ok, %Ol, %Om, %OM, %OS, %Ou, %ow, %Oy: All of these format groups are synonymous with their counterparts without the 'O', except that the string is produced and parsed in the locale-dependent alternative numerals.

%p: On output, produces the indicator for 'a.m.', or 'p.m.' appropriate for the given locale, converted to upper case. On input, accepts whatever %p produces (in upper or lower case) and may be used to determine time of day.

%P: On output, produces the indicator for 'a.m.', or 'p.m.' appropriate for the given locale. On input, accepts whatever %p produces (in upper or lower case) and may be used to determine time of day.

%Q: On output, produces a StarDate. On input, accepts a StarDate and may be used to determine calendar date and time of day.

%r: On output, produces a locale-dependent time of day representation on a 12-hour clock. On input, accepts whatever %r produces and may be used to determine time of day.

%R: On output, produces a locale-dependent time of day representation on a 24-hour clock. On input, accepts whatever %R produces and may be used to determine time of day.

%s: On output, produces a string of digits representing the count of seconds since 1 January 1970, 00:00 UTC. On input, accepts a string of digits and accepts it as such a count; may be used to determine date and time of day.

%S: On output, produces a two-digit number of the second of the minute (00-59). On input, accepts two digits. May be used to determine time of day.

%t: On output, produces a TAB character. On input, matches a TAB character.

%T: Synonymous with %H:%M:%S.

%u: On output, produces the number of the day of the week (1-Monday,7-Sunday). On input, accepts a single digit. May be used to determine calendar day.

%U: On output, produces the ordinal number of the week of the year (00-53). The first Sunday of the year is the first day of week 01. On input accepts two digits which are otherwise ignored. This format group is never used in determining an input date.

%V: On output, produces the number of the ISO8601 week as a two digit number (01-53). Week 01 is the week containing January 4; or the first week of the year containing at least 4 days; or the week containing the first Thursday of the year (the three statements are equivalent). Each week begins on a Monday. On input, accepts the ISO8601 week number, and may be used to determine the calendar day.

%w: On output, produces a week number (00-53) within the year; week 01 begins on the first Monday of the year. On input, accepts two digits, which are otherwise ignored. This format group is never used in determining an input date.

%x: On output, produces the date in a locale-dependent representation. On input, accepts whatever %x produces and may be used to determine calendar date.

%X: On output, produces the time of day in a locale-dependent representation. On input, accepts whatever %X produces and may be used to determine time of day.

%y: On output, produces the two-digit year of the century. On input, accepts two digits, and may be used to determine calendar date. Note that %y does not yield a year appropriate for use with the ISO8601 week number %V; programs should use %g for that purpose.

%Y: On output, produces the four-digit calendar year. On input, accepts four digits and may be used to determine calendar date. Note that %Y does not yield a year appropriate for use with the ISO8601 week number %V; programs should use %G for that purpose.

%z: On output, produces the current time zone, expressed in hours and minutes east (+hhmm) or west (-hhmm) of Greenwich. On input, accepts a time zone specifier (see TIME ZONES below) that will be used to determine the time zone.

%Z: On output, produces the current time zone's name, possibly translated to the given locale. On input, accepts a time zone specifier (see TIME ZONES below) that will be used to determine the time zone. This option should, in general, be used on input only when parsing RFC822 dates. Other uses are fraught with ambiguity; for instance, the string BST may represent British Summer Time or Brazilian Standard Time. It is recommended that date/time strings for use by computers use numeric time zones instead.

%%: On output, produces a literal '%' charater. On input, matches a literal '%' character.

%+: Synonymous with "%a %b %e %H:%M:%S %Z %Y".

Time Zones

There are several ways that a time zone may be specified for use with [clock scan], [clock format] and [clock add]. In order of preference:

Once the time zone is obtained by one of these means, it is interpreted as follows:

":localtime": This specifier requests that the C library functions localtime() and mktime() be used whenever converting times between local and Greenwich. It is generally used as a last resort if the time zone can be determined in no other way.

"+hhmm", "+hhmmss", "-hhmm", "-hhmmss": These specifiers give the time zone explicitly in terms of hours, minutes and seconds east (+) or west (-) of Greenwich.

":filename": The given file name is interpreted as a path name relative to [info library]/tzdata, and the specified file is loaded as a Tcl script. The script is expected to set the :filename element in the tzdata array to a list of transitions. Each transition is a four-element list comprising:

* the time at which the transition takes place, expressed in seconds from the Posix Epoch (1 January 1970, 00:00 UTC)

* the offset (in seconds east of Greenwich) to apply.

* an indicator (0=Standard Time, 1=Daylight Saving Time)

* the name to use when displaying the given time zone in the root locale.

The first transition is expected to take place at time -9223372036854775808, the smallest value of a wide integer.

Any string recognizable as a Posix time zone specifier: A time zone may be specified in Posix syntax (see http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ), for example EST5EDT or EST+05:00EDT+04:00,M4.1.0/01:00,M10.5.0/02:00.

Any other string is processed by prefixing a colon and attempting to load the given file, as shown above.

Localisation

The [clock] command is localised by a set of message catalogs located in [file join [info library] clock msgs] and loaded into the namespace, ::tcl::clock. The possible strings to be translated include:

AM: The string that identifies ante meridiem times when expressing a time of day in the given locale. This string has the value, {am} in the root locale.

BCE: The string that identifies dates before the Common Era in the given locale. This string has the value, {B.C.E.} in the root locale. Those localising this string should be aware that, depending on local culture, a name such as "B.C." (before Christ) may be offensive.

CE: The string that identifies dates of the Common Era in the given locale. This string has the value, {C.E.} in the root locale. Those localising this string should be aware that, depending on local culture, a name such as "A.D." (Latin, anno Domini, "in the year of Our Lord") may be offensive.

DATE_FORMAT: The format specifier for calendar dates in the given locale. In the root locale, %m/%d/%Y is used for compatibility with earlier versions of the [clock] command, even though %Y-%m-%d would probably be preferable.

DATE_TIME_FORMAT: The format specifier for combined date and time in the given locale. In the root locale, {%a %b %e %H:%M:%S %Y} is used for compatibility with earlier versions of the [clock] command, even though %Y-%m-%dT%H:%M:%S would be preferable.

DAYS_OF_WEEK_ABBREV: Abbreviations of the days of the week in the given locale. In the root locale, this string has the value, {Sun Mon Tue Wed Thu Fri Sat}. In any locale, this string is expected to represent a valid Tcl list.

DAYS_OF_WEEK_FULL: Full names of the days of the week in the given locale. In the root locale, this string has the value, {Sunday Monday Tuesday Wednesday Thursday Friday Saturday}. In any locale, this string is expected to represent a valid Tcl list.

GREGORIAN_CHANGE_DATE: The date on which the change from the Julian to the Gregorian calendar takes place, expressed as a Julian Day Number. In the root locale, this string has the value, {2299161}, corresponding to 15 October 1582 New Style. In the 'en' locale, this value is {2361222}, 14 September 1752 New Style.

LOCALE_DATE_FORMAT: The format to use when formatting dates in the locale's alternative calendar. In the root locale, LOCALE_DATE_FORMAT is %x, which causes formatting without alternative numerals.

LOCALE_DATE_TIME_FORMAT: The format to use when formatting date/time strings in the locale's alternative calendar. In the root locale, LOCALE_DATE_TIME_FORMAT is %Ex %EX, which causes concatenation of the locale's format for date, a space character, and the locale's format for time.

LOCALE_ERAS: In a locale where a calendar with multiple eras is in use, gives a list of triples. The first element of each triple is the time (in seconds from the Posix epoch of 1 January 1970, 00:00 UTC) at which the era begins; the second is the name of the era, and the third is a constant offset to be subtracted from the Gregorian year to give the year of the era. In any locale, this string is expected to represent a valid Tcl list.

LOCALE_NUMERALS: In a locale where alternative numerals may be used, gives a list containing the numerals that represent the numbers from zero to ninety-nine. Note that these numerals are the ones typically used on calendars, not the ones that represent currencies or quantities. For instance, in a Han locale, the number twenty-one is represented by \u5eff\u4e00, not by \u4e8c\u5341\u4e00. In any locale, this string is expected to represent a valid Tcl list.

LOCALE_TIME_FORMAT: The time format to use when formatting a time of day using a locale's alternative numerals. In the root locale, this string is %X, which causes formatting without alternative numerals.

LOCALE_YEAR_FORMAT: The time format to use when formatting a year in the locale's alternative calendar. In the root locale, this string is %Y.

MONTHS_ABBREV: Abbreviated names of the months in the given locale. In the root locale, consists of three-letter abbreviations for the English months: Jan-Dec. In any locale, this string is expected to represent a valid Tcl list.

MONTHS_FULL: Full names of the months in the given locale. In the root locale, consists of the names of the English months in order from 'January' to 'December'. In any locale, this string is expected to represent a valid Tcl list.

PM: The string that identifies post meridiem times when expressing a time of day in the given locale. This string has the value, {pm} in the root locale.

TIME_FORMAT: String that specifies the default time format in the given locale. In the root locale, this string is {%H:%M:%S}

TIME_FORMAT_12: String that formats time on a 12-hour clock in the given locale. In the root locale, this string is {%I:%M:%S %p}.

TIME_FORMAT_24: String that formats time on a 24-hour clock in the given locale. In the root locale, this string is {%H:%M}.

There is a defined order for substitution of locale strings, which constrains the format groups that can appear in the _FORMAT strings. Specifically:

Example. The following file is "ja.msg", which localises the [clock] command to a Japanese locale.

namespace eval ::tcl::clock {
    ::msgcat::mcset ja DAYS_OF_WEEK_ABBREV [list \
        "\u65e5"\
        "\u6708"\
        "\u706b"\
        "\u6c34"\
        "\u6728"\
        "\u91d1"\
        "\u571f"]
    ::msgcat::mcset ja DAYS_OF_WEEK_FULL [list \
        "\u65e5\u66dc\u65e5"\
        "\u6708\u66dc\u65e5"\
        "\u706b\u66dc\u65e5"\
        "\u6c34\u66dc\u65e5"\
        "\u6728\u66dc\u65e5"\
        "\u91d1\u66dc\u65e5"\
        "\u571f\u66dc\u65e5"]
    ::msgcat::mcset ja MONTHS_ABBREV [list \
        "1"\
        "2"\
        "3"\
        "4"\
        "5"\
        "6"\
        "7"\
        "8"\
        "9"\
        "10"\
        "11"\
        "12"\
        ""]
    ::msgcat::mcset ja MONTHS_FULL [list \
        "1\u6708"\
        "2\u6708"\
        "3\u6708"\
        "4\u6708"\
        "5\u6708"\
        "6\u6708"\
        "7\u6708"\
        "8\u6708"\
        "9\u6708"\
        "10\u6708"\
        "11\u6708"\
        "12\u6708"\
        ""]
    ::msgcat::mcset ja BCE "\u7d00\u5143\u524d"
    ::msgcat::mcset ja CE "\u897f\u66a6"
    ::msgcat::mcset ja AM "\u5348\u524d"
    ::msgcat::mcset ja PM "\u5348\u5f8c"
    ::msgcat::mcset ja DATE_FORMAT "%Y/%m/%d"
    ::msgcat::mcset ja TIME_FORMAT "%k:%M:%S"
    ::msgcat::mcset ja DATE_TIME_FORMAT "%Y/%m/%d %k:%M:%S %z"
    ::msgcat::mcset ja LOCALE_NUMERALS "\u3007 \u4e00 \u4e8c \u4e09 \u56db
       \u4e94 \u516d \u4e03 \u516b \u4e5d \u5341 \u5341\u4e00 \u5341\u4e8c
       \u5341\u4e09 \u5341\u56db \u5341\u4e94 \u5341\u516d \u5341\u4e03 
       \u5341\u516b \u5341\u4e5d \u4e8c\u5341 \u5eff\u4e00 \u5eff\u4e8c 
       \u5eff\u4e09 \u5eff\u56db \u5eff\u4e94 \u5eff\u516d \u5eff\u4e03 
       \u5eff\u516b \u5eff\u4e5d \u4e09\u5341 \u5345\u4e00 \u5345\u4e8c 
       \u5345\u4e09 \u5345\u56db \u5345\u4e94 \u5345\u516d \u5345\u4e03 
       \u5345\u516b \u5345\u4e5d \u56db\u5341 \u56db\u5341\u4e00 
       \u56db\u5341\u4e8c \u56db\u5341\u4e09 \u56db\u5341\u56db 
       \u56db\u5341\u4e94 \u56db\u5341\u516d \u56db\u5341\u4e03 
       \u56db\u5341\u516b \u56db\u5341\u4e5d \u4e94\u5341 
       \u4e94\u5341\u4e00 
       \u4e94\u5341\u4e8c \u4e94\u5341\u4e09 \u4e94\u5341\u56db 
       \u4e94\u5341\u4e94 \u4e94\u5341\u516d \u4e94\u5341\u4e03 
       \u4e94\u5341\u516b \u4e94\u5341\u4e5d \u516d\u5341 
       \u516d\u5341\u4e00 \u516d\u5341\u4e8c \u516d\u5341\u4e09 
       \u516d\u5341\u56db \u516d\u5341\u4e94 \u516d\u5341\u516d 
       \u516d\u5341\u4e03 \u516d\u5341\u516b \u516d\u5341\u4e5d
       \u4e03\u5341 
       \u4e03\u5341\u4e00 \u4e03\u5341\u4e8c \u4e03\u5341\u4e09 
       \u4e03\u5341\u56db \u4e03\u5341\u4e94 \u4e03\u5341\u516d 
       \u4e03\u5341\u4e03 \u4e03\u5341\u516b \u4e03\u5341\u4e5d
       \u516b\u5341 
       \u516b\u5341\u4e00 \u516b\u5341\u4e8c \u516b\u5341\u4e09 
       \u516b\u5341\u56db \u516b\u5341\u4e94 \u516b\u5341\u516d 
       \u516b\u5341\u4e03 \u516b\u5341\u516b \u516b\u5341\u4e5d 
       \u4e5d\u5341 
       \u4e5d\u5341\u4e00 \u4e5d\u5341\u4e8c \u4e5d\u5341\u4e09 
       \u4e5d\u5341\u56db \u4e5d\u5341\u4e94 \u4e5d\u5341\u516d 
       \u4e5d\u5341\u4e03 \u4e5d\u5341\u516b \u4e5d\u5341\u4e5d"
    ::msgcat::mcset ja LOCALE_DATE_FORMAT "%EY\u5e74%B%Od\u65e5"
    ::msgcat::mcset ja LOCALE_TIME_FORMAT "%OH\u6642%OM\u5206%OS\u79d2"
    ::msgcat::mcset ja LOCALE_DATE_TIME_FORMAT \
        "%A %EY\u5e74%B%Od\u65e5%OH\u6642%OM\u5206%OS\u79d2 %z"
    ::msgcat::mcset ja LOCALE_ERAS "
        {-9223372036854775808 \u897f\u66a6 0} 
        {-3060979200 \u660e\u6cbb 1867} 
        {-1812153600 \u5927\u6b63 1911} 
        {-1357603200 \u662d\u548c 1925} 
        {568512000 \u5e73\u6210 1987}"
}

In addition to the standard locales, two special locales may appear on the -locale parameter; current, which designates the result of evaluating [mclocale], and system, which designates the current "system" locale, which is determined by (in order of preference):

Build System

Several tools are provided for the use of maintainers:

loadICU.tcl: Given a distribution of IBM's icu4c http://oss.software.ibm.com/icu/index.html , this program analyzes the source code of the message catalogs and extracts appropriate Tcl-based messages for the date and time formats in the supported locales.

loadtzif.tcl: Given a time zone information file used by the Olson version of 'tzset' (for a description, see the latest 'tzcode' file in [ftp://elsie.nci.nih.gov/pub/]), creates the corresponding Tcl 'tzdata' file.

makeTestCases.tcl: Makes several thousand auto-generated test cases to exercise the time conversion algorithms.

tclZIC.tcl: Given the source code for the Olson time zone descriptions (obtainable as the latest 'tzdata' file in [ftp://elsie.nci.nih.gov/pub/]), creates the full set of Tcl 'tzdata' files.

Since these tools depend on third party source, they will not be included in the usual build steps; instead, maintainers will be expected to run them whenever changing files on which they depend. It will be a good practice to update the ICU and Olson files just before cutting a release.

Reference Implementation

The implementation of a refactored [clock] command is a work in progress, and interested developers are urged to contact the TIP author if they want to help with implementation, documentation, or testing. The code is available in the same SourceForge repository as the Tcl core, and Tcl maintainers can obtain it with

  cvs -d:ext:USER@cvs.sf.net:/cvsroot/tcl co newclock

Notes on the cost of implementation

Since it is well known that Tcl code is typically 30-50 times slower than the equivalent C, it is to be expected that [clock scan], [clock format], and [clock add] will be in that performance range. [clock seconds] and [clock clicks] will still be C code and are not expected to suffer a measurable change in performance. (If they do, the implementors plan to address the issue.)

The cost of the time zone data files and the message catalogs is not trivial; they occupy about 1.6 megabytes exclusive of file system fragmentation and may occupy multiple megabytes depending on the minimum size of a file. The implementors assume (and are working to ensure) that some sort of compressed virtual file system will be available as core functionality in the 8.5 final release. With zlib compression, the message catalogs and time zone data total less than half a megabyte. It is worth noting that a distribution that must run in the absolute minimum space may omit both message catalogs and time zone data; if this is done, named time zones (e.g., :America/New_York) will not be available on systems such as Windows that lack 'zoneinfo', and will suffer from Y2038 bugs on systems such as Solaris and Linux that have 'zoneinfo'. Without the message catalogs, the only supported locale will be the root locale (and on Windows, the 'system' locale). This combination provides functionality comparable to the [clock] command prior to this TIP. The Tcl code that implements [clock] is less than eighty kilobytes with comments and blank lines removed; this amount of overhead is thought to be negligible.

Bugs

The reference implementation does not attempt any calendars not based on the hybrid Julian/Gregorian calendar. This implementation is adequate for the Western countries and for the Japanese civil calendar, but does not address the Hijri, Hebraic, Thai, Chinese or Korean calendars. (No Tcl user has requested these, to the best of the knowledge of the author of this TIP.)

The Gregorian change date is not supplied in most locales.

Localisation in most locales was done by an American who is probably excessively ignorant in such matters.

This TIP makes no effort to be compliant with RFC 2550 http://www.faqs.org/rfcs/rfc2550.html .

Copyright

Copyright 2004, by Kevin B. Kenny. Redistribution permitted under the terms of the Open Publication License http://www.opencontent.org/openpub/ .

Acknowledgments

The author of this TIP wishes to thank all the Tcl'ers who have taken the time to read and comment on it, most notably Joe English, Donal K. Fellows, Jeff Hobbs, Arjen Markus, Reinhard Max, Christopher Nelson, Donald G. Porter, Pascal Scheffers, and Peter da Silva.

History