Tcl Source Code

Ticket Change Details
Login
Overview

Artifact ID: 1445f8fa9a28990baf36cf38b968cc71c16ea839
Ticket: 8e1e31eac0fd6b6c4452bc108a98ab08c6b64588
lsort treats NUL chars strangely
User & Date: sebres 2017-07-20 14:35:20
Changes

  1. closedate changed to: "2457955.10787487"
  2. closer changed to: "sebres"
  3. icomment:
    Shortly: use the option <code>-dictionary</code> by <code>lsort</code>.
    
    In Tcl the string <code>"\0"</code> is an utf-8 sequence c080 hex...
    See for example <code>expr {"\0" eq [encoding convertfrom utf-8 \xc0\x80]}</code>
    This going to special handling (resp. special utf-8 table) within Tcl to differentiate between zero-byte and zero-NTS-character.
    
    But <code>lsort</code> (without <code>-dictionary</code>) will do that not for zero-char, but for all other non-ascii charaters also (e. g. umlauts, etc.).
    
    Possibly following example can help you to do the sorting using byte-comparision...
    <pre><code>
    % lsort [list "\0 1" "\x7F 2" "\x80 3"]
    {⌂ 2} {  1} {? 3}
    % lsort -dictionary [list "\0 1" "\x7F 2" "\x80 3"]
    {  1} {⌂ 2} {? 3}
    % proc sortbybyte {a b} {expr {[scan $a %c] - [scan $b %c]}}
    % lsort -command sortbybyte -index 0 [list "\0 1" "\x7F 2" "\x80 3"]
    {  1} {⌂ 2} {? 3}
    </code></pre>
    
    @TCT, @Jan: should we handle the default sorting algorithm to take into account this byte-sequence, so it will be sorted as first char in utf-8?<br/>
    I think not...
    
  4. login: "sebres"
  5. mimetype: "text/x-fossil-wiki"
  6. resolution changed to: "Invalid"
  7. status changed to: "Closed"