Tk Source Code

View Ticket
Login
2018-07-08
11:56 Ticket [62f1343a] Tk textbox not working with "Bengali" set as keyboard input language status still Open with 3 other changes artifact: 68679b3d user: budden
2017-05-31
18:24 Ticket [62f1343a]: 3 changes artifact: 77b5be12 user: budden
15:14 Ticket [62f1343a]: 3 changes artifact: a8b53389 user: jan.nijtmans
15:13 Ticket [62f1343a]: 3 changes artifact: bb8acc18 user: jan.nijtmans
15:10
Attempt to fix [62f1343ad2]: Tk textbox not working with "Bengali" set as keyboard input language. Patch concept delevered by "budden", simplified a little bit, should have the same effect. (I would prefer to change the only remaining PeekMessageA() to PeekMessage(). But if the msg then doesn't contain sufficient information to decide upon, maybe we have to live with this ...) Leaf check-in: 807d983e user: jan.nijtmans tags: bug-62f1343ad2
14:05 Ticket [62f1343a] Tk textbox not working with "Bengali" set as keyboard input language status still Open with 4 other changes artifact: c57897f7 user: jan.nijtmans
2017-05-30
20:47 Ticket [62f1343a]: 3 changes artifact: 8d3445ad user: budden
20:37 Ticket [62f1343a]: 3 changes artifact: ace7952e user: budden
20:24 Ticket [62f1343a]: 3 changes artifact: 7d8bae2b user: budden
17:50 Ticket [6c0d7aec] unicode text input status still Open with 3 other changes artifact: 35ce2acb user: fvogel
2016-08-20
16:33 Ticket [62f1343a] Tk textbox not working with "Bengali" set as keyboard input language status still Open with 3 other changes artifact: eac107dd user: dkf
2015-12-22
12:17 Ticket [62f1343a]: 6 changes artifact: d578df43 user: dkf
2015-12-07
21:57 Ticket [62f1343a]: 5 changes artifact: 3b46f701 user: DExtR
2014-09-18
02:34 Ticket [62f1343a]: 3 changes artifact: ccd9ba15 user: apnadkarni
02:34 Ticket [62f1343a]: 3 changes artifact: 1d4c9cf2 user: apnadkarni
2014-09-11
14:14 New ticket [62f1343a]. artifact: de999191 user: anonymous

Ticket UUID: 62f1343ad2fd9c77d1f3bf4c9e244e6f33787172
Title: Tk textbox not working with "Bengali" set as keyboard input language
Type: Bug Version: 8.4, 8.6
Submitter: anonymous Created on: 2014-09-11 14:14:16
Subsystem: 69. Events Assigned To: jan.nijtmans
Priority: 8 Severity: Severe
Status: Open Last Modified: 2018-07-08 11:56:59
Resolution: None Closed By: nobody
    Closed on:
Description:
OS: Windows, TK: 8.4 and 8.6
If keyboard input is configured for "Bengali" language, the input characters appear as ????. If I type bengali text e.g. in Notepad++ and Copy & Paste it to the Tk textbox the characters are displayed correctly.

For other keyboard languages like "Arabic" it works correctly.

I looked into the sources and it seems the problem already occurs where keyboard input is processed (thats why copy & paste works).

When keyboard input is processed the following script is executed: 
"\n\ttk::ConsoleInsert %W %A\n    "

%A is replaced with the pressed key and in case of "Bengali" it always inserts "?".
User Comments: budden added on 2018-07-08 11:56:59:
I'm sorry, does anyone going to fix it? It severely damages the usability of tk in non-English keyboard layouts. Sure, it covers not only Bengali, but Russian as well, and, I believe, most of keyboard layouts on this planet :) 

Let me modestly remind you this: 

https://stackoverflow.com/questions/34116195/why-some-characters-can-not-be-typed-in-pythons-idle

budden added on 2017-05-31 18:24:25:
No, it doesn't and even breaks entering of digits (1,2,3 and so on), even in English keyboard layout.

jan.nijtmans added on 2017-05-31 15:14:09:

@budden: Can you try [807d983e5b1d3ef8|this commit], and see if it fixes your problem? Thanks! (now in wiki format ...)


jan.nijtmans added on 2017-05-31 15:13:35:
@budden: Can you try [807d983e5b1d3ef8|this commit], and see if it fixes your problem? Thanks!

jan.nijtmans added on 2017-05-31 14:05:16:
Looking at this ....

budden added on 2017-05-30 20:47:00:
Installed Bengali keyboard and it seem to print things correctly.

budden added on 2017-05-30 20:37:47:
--- f:/downloads/tcl867rc/87/win/tkWinKey.c	Mon May 29 16:50:02 2017
+++ c:/yar/tcl-8.6.6/build/tk8.6.6/win/tkWinKey.c	Tue May 30 23:30:29 2017
@@ -103,8 +103,26 @@
 	if (keyEv->nbytes > 0) {
 	    Tcl_ExternalToUtfDString(TkWinGetKeyInputEncoding(),
 		    keyEv->trans_chars, keyEv->nbytes, dsPtr);
-	}
-    } else if (keyEv->send_event == -3) {
+    } 
+	} else if (keyEv->send_event == -2) {
+	/*
+	 * For UNICODE chars which are not present in current codepage.
+     * This would likely fail for chars with code > 0xffff
+	 */
+
+	int unichar;
+	char buf[TCL_UTF_MAX];
+	int len;
+
+	unichar = keyEv->trans_chars[1] & 0xff;
+	unichar <<= 8;
+	unichar |= keyEv->trans_chars[0] & 0xff;
+
+	len = Tcl_UniCharToUtf((Tcl_UniChar) unichar, buf);
+
+	Tcl_DStringAppend(dsPtr, buf, len);
+
+	} else if (keyEv->send_event == -3) {
 
 	/*
 	 * Special case for WM_UNICHAR and win2000 multi-lingal IME input





--- f:/downloads/tcl867rc/87/win/tkWinX.c	Mon May 29 16:50:02 2017
+++ c:/yar/tcl-8.6.6/build/tk8.6.6/win/tkWinX.c	Tue May 30 23:09:09 2017
@@ -1367,6 +1367,8 @@
     UINT type)
 {
     MSG msg;
+	WPARAM AwParam;
+	int IsItLikeOddPenPower = 0;
 
     xkey->nbytes = 0;
 
@@ -1375,8 +1377,12 @@
 	if (msg.message != type) {
 	    break;
 	}
+	AwParam = msg.wParam;
+	if (((unsigned short) AwParam) > ((unsigned short) 0xff)) {
+		IsItLikeOddPenPower = 1;
+	}
 
-	GetMessageA(&msg, NULL, type, type);
+	GetMessage(&msg, NULL, type, type);
 
 	/*
 	 * If this is a normal character message, we may need to strip off the
@@ -1388,10 +1394,10 @@
 	if ((msg.message == WM_CHAR) && (msg.lParam & 0x20000000)) {
 	    xkey->state = 0;
 	}
-	xkey->trans_chars[xkey->nbytes] = (char) msg.wParam;
+	xkey->trans_chars[xkey->nbytes] = (char) AwParam;
 	xkey->nbytes++;
 
-	if (((unsigned short) msg.wParam) > ((unsigned short) 0xff)) {
+	if (IsItLikeOddPenPower) {
 	    /*
 	     * Some "addon" input devices, such as the popular PenPower
 	     * Chinese writing pad, generate 16 bit values in WM_CHAR messages
@@ -1401,6 +1407,11 @@
 
 	    xkey->trans_chars[xkey->nbytes] = (char) (msg.wParam >> 8);
 	    xkey->nbytes ++;
+	} else if (((unsigned short) msg.wParam) > ((unsigned short) 0xff)) {
+		xkey->trans_chars[0] = (char) msg.wParam;
+	    xkey->trans_chars[1] = (char) (msg.wParam >> 8);
+	    xkey->nbytes ++;
+		xkey->send_event = -2;
 	}
     }
 }


Both patches are relative to http://core.tcl.tk/tk/info/fa61f24c161f4422

budden added on 2017-05-30 20:24:23:
See also [6c0d7aec6713ab6a7c3e12dff7f26bff4679bc9d]

I have a very kludgy patch for the issue, but I hope it does not break other things.

dkf added on 2015-12-22 12:17:46:

We ought to be using the …W version of all of these; nobody supports the operating systems which require the …A versions any more. Can't do a useful dive to work out the details; I'm not using Windows as a development platform at all.


DExtR added on 2015-12-07 21:57:33:
I think the problem is not limited to Bengali and many other languages are affected as well. See the following Stack Overflow questions:
* https://stackoverflow.com/questions/34116195/why-some-characters-can-not-be-typed-in-pythons-idle

Can any of the developers at least confirm or maybe reject this bug, please?

apnadkarni added on 2014-09-18 02:34:55:
Diagnosis from clt:

It seems the problem could be that TCL/TK is internally still using codepages to translate keyboard input to UTF-8.

Its not necessary to use codepages anymore. But to get UTF-16 characters in all window message you have to create a "unicode window":
 * Using the "ANSI" variants like CreateWindowA() etc. you get an ANSI window.
 * Using the "Wide" variants like CreateWindowW() ect. you get a unicode window.

ANSI:
An ANSI window will receive 8 bit char codes in the WM_CHAR message and has to translate them using codepages.
BUT: None of the Indic code pages (like Bengali) are supported as system code pages (aka ANSI code pages), and the 'A' versions of the Win32 API use the system code page to do the automatic conversion to unicode.
(http://www.unicode.org/mail-arch/unicode-ml/y2001-m04/0148.html)

UNICODE:
A Unicode window will receive "wide" 16-bit char codes in the WM_CHAR message.
BUT: You have to use the "W"-variant of ALL of these functions:
 * RegisterClassW
 * CreateWindowW (or CreateWindowExW)
 * GetMessageW
 * PeekMessageW
 * DispatchMessageW

TCL/TK:
TCL has two set of function pointers to the windows API: "asciiProcs" and "unicodeProcs".

By default the "unicodeProcs" are called, which includes "RegisterClassW" and "CreateWindowExW". But the functions driving the message loop are not part of the function pointer set: GetMessage, PeekMessage and DispatchMessage are called directly and are mapping to the "A" variants. 

This ways TCL/TK windows receive 8-bit ANSI char message although they are created using CreateWindowW() and could technically be processing character input using UTF-16 all along the way.

apnadkarni added on 2014-09-18 02:34:22:
More from c.l.t:

How to reproduce

- Switch keyboard to "Arabian" and type some chars -> the arabian characters appear as expected
- Switch keyboard to "Bengali" and type some chars -> instead of bengali characters question marks appear ("???")

It does work in Notepad++ and other editor and if you copy & paste Bengali characters from Notepad++ into the TCL/TK textbox they are displayed correctly.

Source Code

Everytime the keyboard input language is changed TCL/TK receives the WM_INPUTLANGCHANGE event from windows.

tkWinX.c:
       case WM_INPUTLANGCHANGE:
        UpdateInputLanguage(wParam);
        result = 1;
        break;

The wParam values are as follows:
 Arabian - wParam = 0x000000b2, lParam    = 0x04013801
 German  - wParam = 0x00000000, lParam = 0x04070407
 Bengali - wParam = 0x00000000, lParam    = 0x04450845

As you can see Bengali and German seem to be indistinguishable to TCL/TK because it relies on wParam alone. I don't know if I am looking at the right point in the source code, but the problem is there and we are searching for a solution for our customers in Bangladesh.

Any insight is welcome, perhaps TCL/TK is behaving correctly and Windows needs to be setup to work correctly with Bengali keyboard input. It does work in other applications though.