Tcl Source Code

View Ticket
Login
Ticket UUID: 85b7226da1ecf6194342f802ff4361bfc75b8a34
Title: scan of data containing 0x, 0b treated differently to 0d,0o
Type: Bug Version: 8.6,8.7,9
Submitter: juliannoble2 Created on: 2023-08-28 13:02:23
Subsystem: 11. Conversions from String Assigned To: griffin
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2023-09-21 17:15:42
Resolution: Wont Fix Closed By: griffin
    Closed on: 2023-09-21 17:15:42
Description:
P% scan 0x11 %x
-  17
P% scan 0b11 %b
-  3
P% scan 0d11 %d
-  0
P% scan 0o11 %o
-  0

expected 11 for the decimal scan and 9 for the octal one.

The man page doesn't make it obvious to me that scan 0x11 should return 17 rather than 0 - but it seems this has worked for a while - so my guess is that decimal and octal should be updated to match the binary and hex behaviour.
User Comments: griffin added on 2023-09-21 17:15:42:
Works as intended.

griffin added on 2023-09-21 17:10:44:

The scan man page states that scan functions "... in a fashion similar to the ANSI C sscanf procedure ...", with a few explicitly identified exceptions. With this in mind, I conclude that the scan is functioning as designed, matching C's sscanf behavior as shown in these results:

usage: ./cscan <string-value> <format-string> <value-type:(i|b|d|f|c)>
Uninitialized variables have value of -777, -7.77, or "BAAD"

sscanf("0o110", "%o%s", &i, &remainder) -> returned 2, results i=0 remainder="o110"
 scan  "0o110"  "%o%s"  i    remainder  -> returned 2, results i=0 remainder="o110"
sscanf("0d110", "%d%s", &i, &remainder) -> returned 2, results i=0 remainder="d110"
 scan  "0d110"  "%d%s"  i    remainder  -> returned 2, results i=0 remainder="d110"
sscanf("0x110", "%x%s", &i, &remainder) -> returned 1, results i=272 remainder="BAAD"
 scan  "0x110"  "%x%s"  i    remainder  -> returned 1, results i=272 remainder="BAAD"
sscanf("0b110", "%b%s", &i, &remainder) -> returned 0, results i=-777 remainder="BAAD"
 scan  "0b110"  "%b%s"  i    remainder  -> returned 1, results i=6 remainder="BAAD"
sscanf("true", "%b%s", &i, &remainder) -> returned 0, results i=-777 remainder="BAAD"
 scan  "true"  "%b%s"  i    remainder  -> returned 0, results i=-777 remainder="BAAD"
sscanf("0", "%b%s", &i, &remainder) -> returned 0, results i=-777 remainder="BAAD"
 scan  "0"  "%b%s"  i    remainder  -> returned 1, results i=0 remainder="BAAD"
sscanf("0110", "%o%s", &i, &remainder) -> returned 1, results i=72 remainder="BAAD"
 scan  "0110"  "%o%s"  i    remainder  -> returned 1, results i=72 remainder="BAAD"
sscanf("110", "%o%s", &i, &remainder) -> returned 1, results i=72 remainder="BAAD"
 scan  "110"  "%o%s"  i    remainder  -> returned 1, results i=72 remainder="BAAD"
sscanf("110", "%i%s", &i, &remainder) -> returned 1, results i=110 remainder="BAAD"
 scan  "110"  "%i%s"  i    remainder  -> returned 1, results i=110 remainder="BAAD"
sscanf("110.3ms", "%lg%s", &d, &remainder) -> returned 2, results d=110.300000 remainder="ms"
 scan  "110.3ms"  "%lg%s"  d    remainder  -> returned 2, results d=110.3 remainder="ms"
sscanf("0d10", "%x%s", &i, &remainder) -> returned 1, results i=3344 remainder="BAAD"
 scan  "0d10"  "%x%s"  i    remainder  -> returned 1, results i=3344 remainder="BAAD"
sscanf("0b10", "%x%s", &i, &remainder) -> returned 1, results i=2832 remainder="BAAD"
 scan  "0b10"  "%x%s"  i    remainder  -> returned 1, results i=2832 remainder="BAAD"
sscanf("0x10", "%x%s", &i, &remainder) -> returned 1, results i=16 remainder="BAAD"
 scan  "0x10"  "%x%s"  i    remainder  -> returned 1, results i=16 remainder="BAAD"

See the attached files to reproduce these results.

I am closing this as "wont fixed", works as intended.


griffin added on 2023-09-02 23:06:42:

With the current state of trunk, Tcl scan matches C's sscanf() behavior, AFAICT.

So it depends on what specification the Tcl implementation is intended to address: C compatible string input? Or, Tcl compatible string input. Given Tcl's "everything is a string" and the plethora of ways for Tcl to process a number, I think C compatibility may be more important wrt scan, IMHO.

Thoughts?


jan.nijtmans added on 2023-08-29 13:56:27:

It could also be considered as a bug in %o/%d. Since %x recognises the "0x" prefix and %b recognises the "0b" prefix, I would expect %d to recognize the "0d" prefix and %o the "0o" prefix.

Indeed, it's not consistant now. How about:

scan should consume only the characters in the alphabet of a given specifier, and its prefix (depending on the given radix).

So if %x is given, "0x" is the only prefix recognised. When %o is given "0o" is the only radix recognised.


emiliano added on 2023-08-29 13:31:48:
This looks like a bug in %x and %b specifiers; [scan] should consume only the characters in the alphabet of a given specifier.

% scan 0o110 %o%s
0 o110
% scan 0d110 %d%s
0 d110
% scan 0x110 %x%s
272 {}
% scan 0b110 %b%s
6 {}

[regexp] already rejects hexadecimal qualifiers

% regexp {^[[:xdigit:]]*$} 0xabc35d
0
% regexp {^[[:xdigit:]]*$} 0abc35d
1

Also, since both "b" and  "d" are members of the hexadecimal alphabet, [scan] will behave inconsistently with these numbers:

% scan 0d10 %x
3344
% scan 0b10 %x
2832
% scan 0x10 %x
16

griffin added on 2023-08-28 16:58:09:
IMO "scan" should not be using an interpretation of the string (i.e. internal rep). 
It should only be using the string value. 
It also should be strict with the format type, 
so %d should accept only { 0 1 2 3 4 5 6 7 8 9 + - }. 
Nothing else.
Only if the # flag is used in the %d format specifier should the 0d prefix be accepted.

The C standard library definition should be used here to inform the implementation of [scan].

juliannoble2 added on 2023-08-28 13:42:33:
And now I don't see the problem..  you scan as hex if you want to get hex - but should be able to scan 0b11 or 0d11 as binary or decimal.
I'll be quiet now and let someone else weigh in on all this.

juliannoble2 added on 2023-08-28 13:29:06:
ok.. thinking on this further
0b and 0d are valid hex.. so I see the problem!

I guess this can't be made to work.. sorry for the noise!

juliannoble2 added on 2023-08-28 13:16:12:
Also - with tip 114 for Tcl9
Is the behaviour

% scan 011 %i
9

The desired outcome still?
My assumption is that this should be updated for Tcl9 and %i should heed the radix specifiers and not automatically treat leading zeros as octal.

As it stands scan ... %i only seems to recognize 0x

Can this all be made more coherent for 9?

Attachments: