Tcl Source Code

View Ticket
Login
Ticket UUID: 526524
Title: iso2022-jp conversion problems
Type: Bug Version: obsolete: 8.4a4
Submitter: furukawa Created on: 2002-03-06 18:24:46
Subsystem: 44. UTF-8 Strings Assigned To: hobbs
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2002-04-18 08:52:18
Resolution: Fixed Closed By: hobbs
    Closed on: 2002-04-18 01:52:18
Description:
tcl8.4a4 addressed several problems around the
iso2022-jp enconding. 
For example, bugs that I submitted in the past was
mostly fixed.
[ BugID: 218099 ] iso2022-jp encoding does not work.
[ BugID: 219283 ] iso2022-jp encoding is broken

However, it still have problems when I convert
relatively long (longer 
than several kilo-bytes) japanese texts (eg. Unix
Japanese Manual 
Pages) into iso2022-jp.  I'll attach a scipt to
reproduce that. 

Some details follow. 

(1) euc-jp to iso2022-jp gets-puts conversion

When I convert a text with "tclsh8.4 eucjis.tcl -eucjis
-gets infile outfile",
sometimes "esc ( B" is missing, sometimes extra "esc (
B" appears.  
While extra "esc ( B" does not matter, missing "esc (
B" causes 
missing characters on reading.  The error is
reprodusible if I use the 
same file, but I don't know how and when it happens. 

"od -x -a" of an example error is below.  If I extract
the erroneous 
line, the error does not occur.  Thus the error is not
the code 
dependent but context dependent. 

[ output from eucjis.tcl -eucjis -gets euc.txt
jis-n3.txt ]

           %   H   $   7   $   ^ esc   (   B  nl  sp 
sp  sp  sp  sp  sp
  0007760     241b    2442    2139    1b23    4228   
0a0a    2020    752d
         esc   $   B   $   9   !   # esc   (   B  nl 
nl  sp  sp   -   u
! 0010000     2020    241b    2542    213d    253c   
2148    2d4a    2074
!         sp  sp esc   $   B   %   =   !   <   %   H  
!   J   -   t  sp
! 0010020     241b    2442    3b48    4d48    2451   
2439    246b    2448
!        esc   $   B   $   H   ;   H   M   Q   $   9  
$   k   $   H   $

[ correct output produced from a software called nkf ]

           %   H   $   7   $   ^ esc   (   B  nl  sp 
sp  sp  sp  sp  sp
  0007760     241b    2442    2139    1b23    4228   
0a0a    2020    752d
         esc   $   B   $   9   !   # esc   (   B  nl 
nl  sp  sp   -   u
! 0010000     2020    241b    2542    213d    253c   
2148    1b4a    4228
!         sp  sp esc   $   B   %   =   !   <   %   H  
!   J esc   (   B
! 0010020     742d    1b20    4224    4824    483b   
514d    3924    6b24
!          -   t  sp esc   $   B   $   H   ;   H   M  
Q   $   9   $   k
! 0010040     4824    2d24    4b21    5e24    3f24   
4f24    3d49    283c
!          $   H   $   -   !   K   $   ^   $   ?   $  
O   I   =   <   (

(2) euc-jp to iso2022-jp read-puts conversion

When I convert a text with "tclsh8.4 eucjis.tcl -eucjis
-read infile outfile",
sometimes extra "esc $ B" appears in the middle of the
output.  
It seems it always appears at around the character
number 4096 or 
8192, etc.  (It's not byte number, but character
number.)  Thus, 
if the tcl internal buffer for unicode storage is
8192-byte long 
(4096 characters), such boundary handling is supposed
to have some 
bugs, at the beginning of each internal buffer. 

(3) font selection mechanism

Under tk8.4a4 some character is not displayed correctly
with a font
like "*-jisx0208.1983-1".  It is a minor problem, since
we normally use 
"*-jisx0208.1983-0". 

>
User Comments: hobbs added on 2002-04-18 08:52:18:

File Added - 21415: yamako-endenc.patch

Logged In: YES 
user_id=72656

Applied patch to 8.4 head on 2002-04-17.  Attached patch 
for posterity.

yamako added on 2002-03-12 21:37:12:
Logged In: YES 
user_id=475117

Hi, 
I sent Mr. Furukawa an additional patch to fix this 
problem, then I received his message that (1) and (2) 
problems were solved.

My additional patch is available from:
http://www3.ocn.ne.jp/~yamako/tcl/iso2022-
jp.tcl84a4.2002mar12.patch

furukawa added on 2002-03-08 19:31:00:
Logged In: YES 
user_id=49637

Problems (1)  and (2) were found to be fixed by a patch by
Koichi Yamamoto (private communication).  He may submit the
patch after he refine it.

furukawa added on 2002-03-07 01:24:46:

File Added - 18910: eucjis.tcl

Attachments: