Ticket UUID: | 1306162 | |||
Title: | some args turned into garbages on cp932 | |||
Type: | Bug | Version: | obsolete: 8.4.11 | |
Submitter: | nobody | Created on: | 2005-09-27 19:25:40 | |
Subsystem: | 50. Embedding Support | Assigned To: | dgp | |
Priority: | 8 | Severity: | ||
Status: | Closed | Last Modified: | 2005-10-18 21:37:41 | |
Resolution: | Fixed | Closed By: | dgp | |
Closed on: | 2005-10-18 14:37:41 | |||
Description: |
ActiveTcl8.4.11/Windows2000 SP4 Japanese(system encoding is cp932) test.tcl puts $args and on command line >tclsh test.tcl 江戸 then I got 構]戸([lindex $args 0] was 構]戸). 江戸 is 0x8D 0x5D 0x8C 0xCB. 構]戸 is 0x8D 0x5C 0x5D 0x8C 0xCB. cp932/shiftjis encoding has 0x5B, 0x5D, 0x7B, 0x7D ("[", "]", "{" and "}")in a 2byte character. so it looks for me that we should encode args to utf-8 one by one first, and then create args list in Tcl_Main. thanks | |||
User Comments: |
dgp added on 2005-10-18 21:37:41:
Logged In: YES user_id=80530 closing. remaining issues to be dealt with in a new report. nobody added on 2005-10-18 21:16:25: Logged In: NO thank you dgp. I see REF 491789. If we used args as unicode from a beginning, argv0 and argv encoding problems weren't caused. In this sense, these problems are related to REF 491789. But, it looks a hard work... About argv0 problem, I just want to say, we shouldn't replace \ to / just a multi-byte string. argv0 problem would be caused when we put exe binary on specific multi-byte path. It's a really rare case. I'll post a new for argv0 problem after arranging the problem once again. so please close here, sorry and thank you. dgp added on 2005-10-18 08:37:50: Logged In: YES user_id=80530 The Tk issue is now in the tktoolkit Tracker with ID 1328926. Passing the argv0 question to another maintainer for comments. (Related to RFE 491789?) nobody added on 2005-10-07 16:06:03: Logged In: NO Thank you. I have two ideas for argv0 problem. First idea is to move replacing separator code from main(win/tclAppInit.c) to Tcl_Main(generic/tclMain.c). This method needs "#ifdef __WIN32__ macro" in Tcl_Main. Second idea is to use WIN32API in main(win/tclAppInit.c). Which is better? I tried to make patches for the both and tested on Windows2000. patches for the first idea (for tcl8.4.11) ========================================================== *** win/tclAppInit.c.original Wed Oct 15 07:41:42 2003 --- win/tclAppInit.c Fri Oct 07 11:01:54 2005 *************** *** 104,114 **** GetModuleFileName(NULL, buffer, sizeof(buffer)); argv[0] = buffer; - for (p = buffer; *p != '\0'; p++) { - if (*p == '\\') { - *p = '/'; - } - } #ifdef TCL_LOCAL_MAIN_HOOK TCL_LOCAL_MAIN_HOOK(&argc, &argv); --- 104,109 ---- *** generic/tclMain.c.p1 Fri Oct 07 11:09:45 2005 --- generic/tclMain.c Fri Oct 07 13:29:03 2005 *************** *** 236,241 **** --- 236,244 ---- TclSetStartupScriptFileName(Tcl_ExternalToUtfDString(NULL, TclGetStartupScriptFileName(), -1, &appName)); } + #ifdef __WIN32__ + TclWinNoBackslash(Tcl_DStringValue(&appName)); + #endif Tcl_SetVar(interp, "argv0", Tcl_DStringValue(&appName), TCL_GLOBAL_ONLY); Tcl_DStringFree(&appName); argc--; patch for the second idea (for tcl8.4.11) ========================================================== --- win/tclAppInit.c.originalWed Oct 15 07:41:42 2003 +++ win/tclAppInit.cFri Oct 07 17:27:15 2005 @@ -102,13 +102,21 @@ * slashes substituted for backslashes. */ - GetModuleFileName(NULL, buffer, sizeof(buffer)); - argv[0] = buffer; - for (p = buffer; *p != '\0'; p++) { -if (*p == '\\') { - *p = '/'; +GetModuleFileNameA(NULL, buffer, sizeof(buffer)); +{ +WCHAR *wp; +WCHAR wBuf[MAX_PATH]; +MultiByteToWideChar(CP_ACP, 0, buffer, -1, +wBuf, MAX_PATH); +for (wp = wBuf; *wp != '\0'; wp++) { +if (*wp == '\\') { +*wp = '/'; +} +} +WideCharToMultiByte(CP_ACP, 0, wBuf, -1, +buffer, sizeof(buffer), NULL, NULL); } - } + argv[0] = buffer; #ifdef TCL_LOCAL_MAIN_HOOK TCL_LOCAL_MAIN_HOOK(&argc, &argv); thanks dgp added on 2005-10-06 22:37:38: Logged In: YES user_id=80530 re-opened for another look. at least the port to Tk needs to be done. nobody added on 2005-10-06 22:17:26: Logged In: NO Is anyone still looking at here? I noticed an argv0 problem similar to the patched argv problem... This argv0 problem is caused by replacing \\ to / before calling Tcl_Main. We have need to convert external to utf-8 before replacing \\ to /. Or we have to add a checking code for specific char, if we couldn't convert external to utf-8 before replacing \\ to /. And I noticed that these argv and argv0 patches are needed by wish.exe too. thanks dgp added on 2005-10-01 02:31:31: Logged In: YES user_id=80530 Patches accepted. See also related report 491789. nobody added on 2005-09-30 11:12:28: Logged In: NO I tried the two paches on Windows2000 and Linux. 8.4.11 was OK. and Basically 8.5a4(from CVS) was OK too. I think my reported problem was fixed completely. Thank you very much. Tcl8.5a4 on Windows2000 had some little problems of test unrelated to my reported problem. I will make a search for bug reports about these. thanks dgp added on 2005-09-29 22:10:03: File Added - 150792: main-85.patch dgp added on 2005-09-29 22:10:00: Logged In: YES user_id=80530 ....and here's a corresponding patch for Tcl 8.5a4. dgp added on 2005-09-29 22:09:08: File Added - 150791: main.patch Logged In: YES user_id=80530 Thanks for testing and for the patch. I've attached to this report a different patch. Can you give it a test please? nobody added on 2005-09-29 20:06:34: Logged In: NO I made patch for Tcl8.4.11. It was tested on Windows2000 and Linux. It got no failures and solved the problem. check it please... thanks --- generic\tclMain.c.originalThu May 30 07:59:33 2002 +++ generic\tclMain.cThu Sep 29 20:32:47 2005 @@ -206,9 +206,10 @@ { Tcl_Obj *resultPtr; Tcl_Obj *commandPtr = NULL; - char buffer[TCL_INTEGER_SPACE + 5], *args; + Tcl_Obj *argvPtr = NULL; + char buffer[TCL_INTEGER_SPACE + 5]; PromptType prompt = PROMPT_START; - int code, length, tty; + int code, length, tty, i; int exitCode = 0; Tcl_Channel inChannel, outChannel, errChannel; Tcl_Interp *interp; @@ -238,12 +239,15 @@ * all callers of Tcl_Main to do it. (Those callers are likely * in a main() that can't easily change its signature.) */ - - args = Tcl_Merge(argc-1, (CONST char **)argv+1); - Tcl_ExternalToUtfDString(NULL, args, -1, &argString); - Tcl_SetVar(interp, "argv", Tcl_DStringValue(&argString), TCL_GLOBAL_ONLY); - Tcl_DStringFree(&argString); - ckfree(args); + + argvPtr = Tcl_NewListObj(0, NULL); + for (i=1; i<argc; i++) { + Tcl_Obj *argPtr = NULL; + Tcl_ExternalToUtfDString(NULL, (CONST char*)argv[i], -1, &argString); + argPtr = Tcl_NewStringObj(Tcl_DStringValue(&argString), -1); + Tcl_ListObjAppendElement(interp, argvPtr, argPtr); + } + Tcl_SetVar2Ex(interp, "argv", NULL, argvPtr, TCL_GLOBAL_ONLY); if (TclGetStartupScriptPath() == NULL) { Tcl_ExternalToUtfDString(NULL, argv[0], -1, &argString); dgp added on 2005-09-29 08:20:27: Logged In: YES user_id=80530 Has the proposed patch been tested and does it solve the reported problem? I definitely like the looks of it. nobody added on 2005-09-28 14:18:45: Logged In: NO Well, I think the problem is here. line 242- Tcl_Main function in generic/tclMain.c ---------------------- args = Tcl_Merge(argc-1, (CONST char **)argv+1); Tcl_ExternalToUtfDString(NULL, args, -1, &argString); Tcl_SetVar(interp, "argv", Tcl_DStringValue(&argString), TCL_GLOBAL_ONLY); Tcl_Merge function escape { and } in argv. so if a 2byte character has { or }, it will be escaped, and it makes garbage. I think First, args should be encoded to utf-8, Second we should make argv list. I don't know a lot about Tcl API, but I try to write correct code. It will be more clear than my poor english... ---------------------- Tcl_Obj *listobj = Tcl_NewListObj(0, NULL); for (i=1; i<argc; i++) { Tcl_Obj *argobj = NULL; Tcl_ExternalToUtfDString(NULL, argv[i], -1, &argString); argobj = Tcl_NewStringObj(Tcl_DStringValue(&argString), -1); Tcl_ListObjAppendElement(interp, listobj, argobj); } Tcl_SetVar2Ex(interp, "argv", NULL, listobj, TCL_GLOBAL_ONLY); Tcl_DStringFree(&argString); thanks hobbs added on 2005-09-28 02:37:33: Logged In: YES user_id=72656 unicode command line issues? nobody added on 2005-09-28 02:32:25: Logged In: NO I wrote japanese characters, and it turned into 江戸....very sorry... |