Overview
Artifact ID: | 1445f8fa9a28990baf36cf38b968cc71c16ea839 |
---|---|
Ticket: | 8e1e31eac0fd6b6c4452bc108a98ab08c6b64588
lsort treats NUL chars strangely |
User & Date: | sebres 2017-07-20 14:35:20 |
Changes
- closedate changed to: "2457955.10787487"
- closer changed to: "sebres"
- icomment:
Shortly: use the option <code>-dictionary</code> by <code>lsort</code>. In Tcl the string <code>"\0"</code> is an utf-8 sequence c080 hex... See for example <code>expr {"\0" eq [encoding convertfrom utf-8 \xc0\x80]}</code> This going to special handling (resp. special utf-8 table) within Tcl to differentiate between zero-byte and zero-NTS-character. But <code>lsort</code> (without <code>-dictionary</code>) will do that not for zero-char, but for all other non-ascii charaters also (e. g. umlauts, etc.). Possibly following example can help you to do the sorting using byte-comparision... <pre><code> % lsort [list "\0 1" "\x7F 2" "\x80 3"] {⌂ 2} { 1} {? 3} % lsort -dictionary [list "\0 1" "\x7F 2" "\x80 3"] { 1} {⌂ 2} {? 3} % proc sortbybyte {a b} {expr {[scan $a %c] - [scan $b %c]}} % lsort -command sortbybyte -index 0 [list "\0 1" "\x7F 2" "\x80 3"] { 1} {⌂ 2} {? 3} </code></pre> @TCT, @Jan: should we handle the default sorting algorithm to take into account this byte-sequence, so it will be sorted as first char in utf-8?<br/> I think not...
- login: "sebres"
- mimetype: "text/x-fossil-wiki"
- resolution changed to: "Invalid"
- status changed to: "Closed"