Ticket UUID: | 0f1ddc0df7fb734cf1cba12ac7c529d29fefb842 | |||
Title: | exec fails on encoding errors present in the output of the executed command | |||
Type: | Bug | Version: | 9.0 | |
Submitter: | anonymous | Created on: | 2023-11-27 15:57:26 | |
Subsystem: | 16. Commands A-H | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Important | |
Status: | Closed | Last Modified: | 2024-05-13 13:11:22 | |
Resolution: | Fixed | Closed By: | oehhar | |
Closed on: | 2024-05-13 13:11:22 | |||
Description: |
If executing the diff program with command exec on two files with different encodings, the output of diff cannot be read with Tcl9. The following error message is generated: invalid or incomplete multibyte or wide character The problem occurres using tkdiff and can be reproduced by the following script:
Test files will be attached. Paul | |||
User Comments: |
oehhar added on 2024-05-13 13:11:22:
Paul, I allowed me to change the title of the ticket. I hope this is ok. Take care, Harald oehhar added on 2024-05-13 12:28:03: Nathan, sorry. The process for fixing was done and accepted by the author of the ticket. The reason for this are many voices fearing incompatibilites and code breakage due to errors thrown, were there were no errors before. The drivers for this are people maintaining a lot of code within large organizations like Don Porter. It was also made clear, that exec has limitations and open could be used, if the codepage profile is important. I can understand your agument and I also was convinced of "strict anywhere". But this has changed as a community view. As a consequence, I close the ticket and set the status to "fixed". Thank you for your comprehension, Harald jan.nijtmans added on 2024-05-13 10:08:53: ..... and Rolf, of course .... jan.nijtmans added on 2024-05-13 10:06:24: > I propose that exec retain its newfound strict powers I'll discuss this with Harald, Brian, Steve and Ashok. My guess is that your proposal will not be accepted for 9.0b2. pooryorick added on 2024-05-13 09:08:00:
I've reopened this issue and changed the resolution to "wont fix" in order to further discuss this. "replace" is a minefield, and saving unsuspecting scripts from stepping on a mine when using I propose that oehhar added on 2023-12-07 08:57:26: I want to thank Ashok for this work. We had that open for TCL 9.0 in the weekly wrap-up and Ashok volunteered to look into this. We should always remember, that behind each commit, there are hours of work. Thank you all, guys, Harald apnadkarni added on 2023-12-07 03:11:33: Fixed in [43d35b1d8b]. anonymous added on 2023-12-05 21:56:21: Works fine with the test script and inside tkdiff. Paul apnadkarni added on 2023-12-05 17:12:52: Code ready for review, will merge if no objections in a day or two. Paul's sample test output is now
which I think is correct (the catch raises exception because of diff output as expected) apnadkarni added on 2023-12-05 14:06:31: WIP in [apn-bug-0f1ddc0df7]. This intentionally fixes only exec to use the replace profile. In particular, open is unchanged (strict default) as the -profile option can be used there to if necessary. Still WIP because stderr output behaves as if strict profile is in use. oehhar added on 2023-12-01 16:40:51: Hi Jan, thanks, that you jump into the boat. Things move, if this happens. My opinion:
Take care, Harald jan.nijtmans added on 2023-12-01 12:03:40: If we decide that the best profile for env/glob/argv handling is "replace", we could also make one step further and change the default flag for Tcl_UtfToExternalDString/Tcl_ExternalToUtfDString. That's easier than replace all such calls with the *Ex() alternative, specifying TCL_ENCODING_PROFILE_REPLACE separately everywhere. How do we think about: [0d84761f9fecedd4|this]? jan.nijtmans added on 2023-12-01 11:48:21: > This was already the case in 657 for glob, env as Rob mentions glob, env currently still uses the "tcl8" profile, since no-one decided to change that. Command line arguments to tclsh/wish still use "tcl8" profile as well. apnadkarni added on 2023-11-28 03:01:55: I also share Rolf's opinion that commands where there is no option to configure the encoding profile used by the command should use the replace profile. This was already the case in 657 for glob, env as Rob mentions, and I would argue should be the case for exec as well. As an aside, I would think the same would apply to *arguments* passed to exec as well as command line passed to tclsh (argv, argv0) though that may be a more minor issue and not worth bothering over. pointsman added on 2023-11-27 20:07:27: Perhaps the better workaround for the given example would be:
While it is true that exec is simple-minded wrt to the encoding of the stdin, stdout (and stderr) of the started process the core problem here are the changes in the I/O system of Tcl 9, especially TIP 657. Please note that this is an improvement. With Tcl 8.6 the code example provided by Paul show a wrong result without a chance that the programm / programmer is able to notice. But perhaps it would be better if exec would open the channels always with -profile replace. TIP 657 already has the exception of the glob command (and access to the env array). This would be another one, for reasons. dkf added on 2023-11-27 16:47:26: The The workaround is to use a pipeline, perhaps like this:
Since you have access to the channel above (unlike inside exec which handles all of that internally) you can configure it any way you want to handle the input or output you expect. (It also makes it easier to capture stderr of the pipeline without losing stdout, etc.)
Note that both |