Tcl Source Code

View Ticket
Login
Ticket UUID: 2089279
Title: StringObj.3 Tcl_ObjPrintf inaccuracies
Type: Bug Version: obsolete: 8.6a2
Submitter: georgeps Created on: 2008-09-02 16:27:27
Subsystem: 10. Objects Assigned To: jan.nijtmans
Priority: 8 Severity: Minor
Status: Closed Last Modified: 2024-02-05 15:30:27
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2024-02-05 15:30:27
Description:
StringObj.3 gives this example:
long x = 5;
Tcl_Obj *objPtr = Tcl_ObjPrintf("Value is %d", x);

This is incorrect, because the Tcl_ObjPrintf code will expect an int, and use va_arg with int.

As can be seen in the AppendPrintfToObjVA:
 case 'd':
   ...
   seekingConversion = 0;
   switch (size) {
      case -1:
      case 0:
        Tcl_ListObjAppendElement(NULL, list, Tcl_NewLongObj((long int)va_arg(argList, int)));

The issue is that if the user uses %d for a long, the va_list will have the wrong offset.  The example will work in the 1 argument case, and even the multiple argument case if sizeof(int) = sizeof(long).  It however will fail in the case of sizeof(long) > sizeof(int), because the va_list will have the wrong offset.  Thus you'd be getting parts of the integer.  This can have disasterous consequences if the offset is wrong, because the next va_arg call becomes wrong, which if it's a pointer may lead to segfaults.

Tcl works with some platforms where a long is 64-bit and an int is 32-bit.  #ifdef TCL_WIDE_INT_IS_LONG as I understand it refers to this.

The next issue is that the format specifiers are not exactly the same as the [format] command, and probably shouldn't be.

The format command uses (as documented) %ld for truncation to a 64-bit integer.  %lld for no truncation (BigNum) and %d for a 32-bit integer.  This is incomptible with AppendPrintfToObjVA() (used by Tcl_ObjPrintf), because it currently accepts a long for %ld which may or may not be 64-bit depending on the platform.  The wide and BigNum cases are also not handled by AppendPrintfToObjVA().  Thus the suggestion in the documentation that Tcl_ObjPrintf has the same syntax as the format command are wrong.
User Comments: jan.nijtmans added on 2024-02-05 15:30:27:

Should _really_ be [0e4c7187e7d373cf|fixed] now, both the documentation and the actual behavior.


jan.nijtmans added on 2024-02-05 14:18:38:

Actually .... on 32-bit platforms it's still not correct in the unsigned case. Example:

Tcl_ObjPrintf("%lu", 4294967295UL)
This will print 18446744073709551615, even on 32-bit platforms. Expected: 4294967295


jan.nijtmans added on 2024-02-01 10:16:23:

> The discrepancy between Tcl_ObjPrintf() and Tcl_Format() should be noted in the doc, and %l / %ll specified.

Documentation is updated now, so I think we can close this.


oehhar added on 2024-02-01 09:33:26:
Here are related messages from the core list for documentation:

-------- Weitergeleitete Nachricht --------
Betreff: 	[TCLCORE] Backport %ll in Tcl_ObjPrintf
Datum: 	Wed, 31 Jan 2024 16:15:45 +0100
Von: 	Eric Boudaillier <[email protected]>
In Tcl 8.6, the documentation says that Tcl_ObjPrintf() works like [format].
However this is not true in 8.6.13.
A fix has been done here: https://core.tcl-lang.org/tcl/info/eac13870dca5d4e7 <https://core.tcl-lang.org/tcl/info/eac13870dca5d4e7> for Tcl 9.
Is it possible to backport this for the next 8.6.14?

I'm trying to use Tcl_ObjPrintf("%lu") to create Tcl_Obj for unsigned 64 bits.
Or is there a better way to do this?

What I am really looking for is in this commit https://core.tcl-lang.org/tcl/info/4c2ed52f8f2e2e50, where a single "l" modifier creates a wideInt.

OK, I should have looked at existing tickets before the code :-)
There is already some (very old) tickets about this:
* https://core.tcl-lang.org/tcl/tktview/2089279fffffffffffff this one, still open, already says that [format] use 64 bits for %l modifier, %lld for BigNum.
* https://core.tcl-lang.org/tcl/tktview/2b6a4fa5ed30b44364b1 this one is what I observe, saying it is fixed in 8.7. So what about 8.6?
* https://core.tcl-lang.org/tcl/tktview/3fbaa4e23bfa711c4340 similar to previous one, also for 8.7 only.

Since we'll have no 8.7, could that be backported in 8.6?

But I'll probably create a new ticket, it seems that the code in tip is still not aligned with what does [format]:

		case 1:
		    Tcl_ListObjAppendElement(NULL, list, Tcl_NewWideIntObj(
			    va_arg(argList, long)));
		    break;
		case 2:
		    Tcl_ListObjAppendElement(NULL, list, Tcl_NewWideIntObj(
			    va_arg(argList, Tcl_WideInt)));
		    break;

-------- Weitergeleitete Nachricht --------
Datum: Wed, 31 Jan 2024 17:34:43 +0100
Von: Jan Nijtmans <[email protected]>

Op wo 31 jan 2024 om 17:19 schreef Eric Boudaillier:
> Since we'll have no 8.7, could that be backported in 8.6?

It's a little bit tricky, but possible. If we implement "ll", it's tempting
to use TCL_LL_MODIFIER, which results in "I64" on Win64. So we
should implement that too. And because "format" uses the same
functionality ....

I think this should do it:
     <https://core.tcl-lang.org/tcl/info/c6795d89c8467204>

-------- Weitergeleitete Nachricht --------
Datum: 	Wed, 31 Jan 2024 18:01:26 +0100
Von: 	Eric Boudaillier <[email protected]>

I'm not sure...
What you wrote for case 2 (%ll) is what I'd like for case 1 (%l), and what does [format].
case 2 needs to pass a bignum (same as case 3 for 'L' modifier - not in tip? where did I see that...).
I32 and I64 are not specified in 8.6 format.n, not needed in my mind, I don't want to change the doc, just fix the code.
It's not clear now if Tcl_ObjPrintf() is supposed to do what print() does or what [format] does, but the doc says [format].
I have used it because printf() was different across windows/linux/32/64 bits.

-------- Weitergeleitete Nachricht --------
Datum: Wed, 31 Jan 2024 21:33:33 +0100
Von: Jan Nijtmans <[email protected]>

Op wo 31 jan 2024 om 18:01 schreef Eric Boudaillier:
>
> I'm not sure...
> What you wrote for case 2 (%ll) is what I'd like for case 1 (%l), and what does [format].

I'm sure. Tcl_WideInt stands for "long" on 64-bit platforms, "long
long" on 32-bit
platforms and __int64 on Windows. So, the portable way to format
a Tcl_WideInt is using TCL_LL_MODIFIER, which is "l" on most platforms
(like you), "ll" on 32-bit platforms and "I64" on Windows. If we want to
support Tcl_WideInt in a portable way, we need to implement all of them,
even though _you_ probably only need "l".

case 1 is for "long" ("l"), case 2 is for "long long" ("ll")

-------- Weitergeleitete Nachricht --------
Datum: Wed, 31 Jan 2024 20:39:47 +0000
Von: Donal Fellows <[email protected]>

Jan Nijtmans <[email protected]> wrote:
> I'm sure. Tcl_WideInt stands for "long" on 64-bit platforms, "long long" on 32-
> bit platforms and __int64 on Windows. So, the portable way to format a
> Tcl_WideInt is using TCL_LL_MODIFIER, which is "l" on most platforms (like
> you), "ll" on 32-bit platforms and "I64" on Windows. If we want to support
> Tcl_WideInt in a portable way, we need to implement all of them, even though
> _you_ probably only need "l".

It's a shame that it's currently the thing that's making Tcl builds noisy, with messages like this coming up rather often:

.../tcl/generic/tclHash.c:636:21: warning: format '%u' expects argument of type
unsigned int', but argument 5 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
  636 |     snprintf(p, 60, "number of buckets with %d or more entries: %" TCL_Z_MODIFIER "u\n",
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  637 |             NUM_COUNTERS, overflow);
      |                           ~~~~~~~~
      |                           |
      |                           size_t {aka long long unsigned int}
.../tcl/generic/tclHash.c:636:84: note: format string is defined here
  636 |     snprintf(p, 60, "number of buckets with %d or more entries: %" TCL_Z_MODIFIER "u\n",
      |                                                                 ~~~~~~~~~~~~~~~~~~~^
      |                                                                                    |
      |                                                                                    unsigned int
      |                                                                 %" TCL_Z_MODIFIER "llu

I'm not picking on that specific site at all; it's just the first one I found in my local build logs.

-------- Weitergeleitete Nachricht --------
Datum: 	Thu, 1 Feb 2024 05:55:22 +0100
Von: 	Eric Boudaillier <[email protected]>

OK, let's separate the problems.
I solved my unsigned 64 bits generation by using Tcl_Format("%lu"), which works as expected: %l for wide ints, for 32 bits / 64 bits builds.

I wrongly used Tcl_ObjPrintf(), because there is a discrepancy on how it manages %l / %ll, even in Tcl 9.0b1 (windows 32 bits builds): Tcl_ObjPrintf("%lu") truncates to 32 bits, Tcl_Format("%lu").always formats 64 bits.
The discrepancy between Tcl_ObjPrintf() and Tcl_Format() should be noted in the doc, and %l / %ll specified.
Even the implementation is misleading, both are using Tcl_AppendFormatToObj().

jan.nijtmans added on 2024-01-31 16:41:57:

Proposed fix [c6795d89c8467204|here]


dgp added on 2017-07-09 22:09:45:
For Tcl_ObjPrintf(), the format %lld DOES NOT MEAN (long long int).
If you program as if it does, you will get crashes. Don't do that.

The %d format means pull an (int) which is then pushed through
losslessly as a Tcl_NewLongObj() to be formatted by [format %d]
which expects to act on a (long int).

(I know. That's a confusing mess, but [format] was already designed
that way by the time I got to touch it.)

This allows Tcl_ObjPrintf() to be similar to sprintf() in what C programmers
expect to pass when using the %d specifier, while still re-using the same
formatting engine already existing for [format].

The %ld format means pull a (long int) which is then pushed through
losslessly as a Tcl_NewWideIntObj() to be formatted by [format %ld]
which expects to act on a (Tcl_WideInt).

There is no way to pass a (long long int) or a (Tcl_WideInt) or a (size_t)
argument to Tcl_ObjPrintf().  There's always been a /* TODO */ comment
in the code noting this incompleteness.  More recently we've recorded the
need to correct this in an RFE ticket.

https://core.tcl.tk/tcl/tktview?name=3fbaa4e23b

Jan Nijtmans has related work on the branch z_modifier
and might have more on other branches I cannot find quickly
right now.

stu added on 2017-07-08 18:37:03:
Just stumbled across this.
See also http://core.tcl.tk/tcl/tktview/2b6a4fa5ed30b44364b1

On 32-bit systems:

Tcl_ObjPrintf("%lld %ld\n", (long long) 3, (long) 4))
Yields "3 0"

Tcl_ObjPrintf("%lld %s\n", (long long) 3, "a"))
Segfaults


Right now it's hard to have confidence in Tcl_ObjPrintf.
Even if it was fixed today the broken version will persist in the wild for a long time possibly requiring autoconfery for those who wish to use it with confidence.

Too bad.

dkf added on 2009-04-10 20:15:24:
Corrected the example in the docs, but the real problems persist.

ferrieux added on 2009-03-20 06:33:18:
After a second dive, it looks like the doc is currently an outright lie:
 - The %h prefix for [ciudoxX] is completely ignored (va_arg(int) in all cases)
 (I also don't understand the absence of a "p++" in the %h case)
 - As George mentioned, generalization to %ll et al is not a good idea (not really suited for vararg handling)

So I'd advocate to fix the doc so that it admits taking printf() as a model and diverging only on %h.
(maybe an explicit list of specifiers and modifiers should be added, since printf() has a variable ambition on different unix flavours).

ferrieux added on 2009-03-05 06:10:35:
Indeed, there's a trace of more ambition in APTOVA:

 /* TODO: support for wide (and bignum?) arguments */

at present, what's implemented is closer to sprintf. Could someone review to see how exactly it differs ? Is the %h (length -1) case correct ?