Tcl Source Code

View Ticket
Login
Ticket UUID: 1511357
Title: non-ASCII function names rejected
Type: RFE Version: None
Submitter: dgp Created on: 2006-06-23 14:43:49
Subsystem: 45. Parsing and Eval Assigned To: dgp
Priority: 5 Medium Severity:
Status: Open Last Modified: 2006-07-31 12:34:32
Resolution: None Closed By:
    Closed on:
Description:
Since TIP 232 went Final,
[expr] functions are a 1-1
map with Tcl commands in
the tcl::mathfunc namespace.

The attached script demonstrates
that the [expr] parser rejects
function names with non-ASCII
characters in them, even though
Tcl commands are perfectly happy
to include non-ASCII characters.

Seems inconsistent, and seems
mildly attractive to be able
to define math functions that
actually have the traditional
names instead of an ASCII
transliteration.
User Comments: dgp added on 2006-07-31 12:34:32:

data_type - 360894

kennykb added on 2006-07-31 00:56:33:
Logged In: YES 
user_id=99768

Hmmm.  I like the idea in general, but I think that we
really ought to write a TIP to address these issues across
the board.  There are subtle incompatibilities here, many
of which come up in error reporting; how do you post an
error dialog or write a message to stderr reporting an
unknown mathfunc whose name isn't in the native character
set?  (I know that we can address these issues, but we
need to think about them.)

In addition to math function names, the issues of
international characters in Tcl source code of which I'm
aware are:

- Variable names and $-substitution - Bug 408568. There
  appears to be no good reason that we shouldn't allow a 
  variable named by a Greek letter, for instance.

- Numeric digits - should we recognize equivalents
  for the Indo-Arabic digits (Half-width and full-width
  digits, Arabic/Thai/Devanagari presentation forms of
  the digits, ...)?  [I'm not suggesting dealing with
  non-positional notations like Chinese or Japanese,
  although we could argue whether [string is digit]
  ought to recognize these, too.]  This would affect not
  only [expr] and its friends plus [scan]/[format] etc,
  but also bits like field lengths in [binary scan].
  I'm a little nervous about extending things this far;
  do we really want to inflict the maintainability issues
  that allowing numbers on non-European scripts would
  impose?

- Whitespace. There are a number of places where we look
  at 'isspace' and fail to count the various Unicode
  whitespace characters (breaking and nonbreaking spaces
  of various widths).  (I've been bitten by this one
  already, when a Tcl source file was edited in a Unicode
  editor, and somehow two words got separated by a string
  of half-width spaces.

- Don't even get me started on Unicode collation,
  normalization, case mapping, comparison, ....  We will
  likely want to begin addressing these someday, but
  that ought to be a separate project from allowing i18n
  of our own sources.

In addition, we will need to pursue making international
variable and function names typable in character sets that
lack the characters.  Right now, backslash subsitiution
happens at the wrong times or not at all:

% expr {s\u0069n(3.14159)}
invalid bareword "s" in expression
    (prepend $ for variable; append argument list for
function call)
    (parsing expression "s\u0069n(3.14159)")
% expr "s\u0069n(3.14159)"
2.653589793352726e-6

% set i 1
1
% puts $\u0069
$i
% puts [set \u0069]
1
% puts ${\u0069}
can't read "\u0069": no such variable

In summary: This may well be a great idea whose time is
come, but I'm profoundly uncomfortable about doing it
without a TIP.

dgp added on 2006-06-23 21:53:43:

File Added - 182736: 1511357.patch

Logged In: YES 
user_id=80530


Also contributing to the
desirability of this change
is the new [source -encoding]
option in Tcl 8.5 that for
the first time really permits
scripts to have non-ASCII
characters as originally read in.

As Unicode-aware editors get
more widespread that increases
desirability of this change too.

Attached patch makes the demo
script run successfully.  Please review.

dgp added on 2006-06-23 21:43:49:

File Added - 182735: demo.tcl

Attachments: