Tk Source Code

View Ticket
Login
Ticket UUID: 1821174
Title: RenderBadPicture (invalid Picture parameter)
Type: Bug Version: obsolete: 8.5b3
Submitter: nobody Created on: 2007-10-27 11:35:04
Subsystem: 67. Unix Window Operations Assigned To: jenglish
Priority: 8 Severity: Minor
Status: Closed Last Modified: 2018-06-12 19:38:10
Resolution: Fixed Closed By: fvogel
    Closed on: 2018-06-12 19:38:10
Description:
Playing with BWidget, I've discovered some kind of error. How to repeat:

1. Take the BWidget template from http://wiki.tcl.tk/8289

2. Before last line ("main") insert a line containing:   rename send {}

3. When the demo has been run - choose "Exit". There will be an error message:

#v+
  X Error of failed request:  RenderBadPicture (invalid Picture parameter)
    Major opcode of failed request:  153 (RENDER)
    Minor opcode of failed request:  7 (RenderFreePicture)
    Picture id in failed request: 0x1200080
    Serial number of failed request:  996
    Current serial number in output stream:  1017
#v-

The problem doesn't occur in 8.4.12.

Debian Etch, Tcl/Tk 8.5b1, BWidget 1.7
User Comments: fvogel added on 2018-06-12 19:38:10:
Merged to core-8-6-branch and trunk.

Many thanks to Christian Werner for the patch and to Marc Culler and Kevin Walzer for the extensive tests!

fvogel added on 2018-06-09 20:22:27:

Marc and Kevin both tested the bugfix branch on macOS and they didn't find any issue. They both confirmed this privately to me and Christian. Marc also updated some deprecated calls.

I think we can merge now.


fvogel added on 2018-05-27 18:01:32:

Although I cannot see why calling Xsync() before destroying the window could harm, I understand your point of view.

On macOS, TkMacOSXFlushWindows() is just flushWindow, which looks harmless according to its man page.

Nevertheless, macOS people's opinion would be best to have. Kevin? Marc? (other competent people here?)


chw added on 2018-05-26 18:34:05:
Regarding Windows: no objections, since XSync is a no op.

Regarding MacOSX: testing (and expertise) needed, there's TkMacOSXFlushWindows() which does something (I don't know of due to MacOS noob-ness)

Regarding SDL+AGG+freetype (which irrelevant is anyway): no objections, it only increments the number of requests processed.

I'd say go for it but let's check out things on Macintoshen.

fvogel added on 2018-05-26 10:43:23:

Would the following be acceptable?

Index: generic/tkWindow.c
==================================================================
--- generic/tkWindow.c
+++ generic/tkWindow.c
@@ -2783,11 +2783,10 @@
        Tcl_Preserve(interp);
        Tk_DestroyWindow((Tk_Window) tsdPtr->mainWindowList->winPtr);
        Tcl_Release(interp);
     }

-#if !defined(_WIN32) && !defined(MAC_OSX_TK)
     /*
      * Let error handlers catch up before actual close of displays.
      * Must be done before tsdPtr->displayList is cleared, otherwise
      * ErrorProc() in tkError.c cannot associate the pending X errors
      * to the remaining error handlers.
@@ -2795,11 +2794,10 @@

     for (dispPtr = tsdPtr->displayList; dispPtr != NULL;
            dispPtr = dispPtr->nextPtr) {
        XSync(dispPtr->display, False);
     }
-#endif

     /*
      * Iterate destroying the displays until no more displays remain. It is
      * possible for displays to get recreated during exit by any code that
      * calls GetScreen, so we must destroy these new displays as well as the

Any side effect you can foresee perhaps?


chw added on 2018-05-26 07:48:24:
It is not Linux specific but may hit all X11+Xft environments.
Otherwise, I agree to have this in a platform specific module.
But see jenglish's comments, the entire tear down of the display
connections needs somewhat be redesigned. And this might go even
further: are there similar (asynchronously reported) X error events
on any other platform, or is this X11 specific? If X11 specific,
the whole tkError.c module shouldn't be in the generic directory.

fvogel added on 2018-05-26 07:14:52:

Thank you for your proposal. I have reopened the present ticket and the bugfix branch as well, where I have committed your code, see [0e402fa7].

About the proposed fix: I have read the comment you have added, however is there really no way to add this Linux-specific code elsewhere than in a generic file? It always disturbs me when we need to #ifdef(platform) in generic/somefile.c


chw added on 2018-05-26 05:22:01:
While trying TDK's inspector (as found in the TDK github repo) on various Linuxen
this X Error can be observed again when the window close button is pressed on the
inspector main window. My proposed solution is in this snippet:

---snip---
Index: generic/tkWindow.c
==================================================================
--- generic/tkWindow.c
+++ generic/tkWindow.c
@@ -2782,10 +2782,24 @@
        interp = tsdPtr->mainWindowList->interp;
        Tcl_Preserve(interp);
        Tk_DestroyWindow((Tk_Window) tsdPtr->mainWindowList->winPtr);
        Tcl_Release(interp);
     }
+
+#if !defined(_WIN32) && !defined(MAC_OSX_TK)
+    /*
+     * Let error handlers catch up before actual close of displays.
+     * Must be done before tsdPtr->displayList is cleared, otherwise
+     * ErrorProc() in tkError.c cannot associate the pending X errors
+     * to the remaining error handlers.
+     */
+
+    for (dispPtr = tsdPtr->displayList; dispPtr != NULL;
+           dispPtr = dispPtr->nextPtr) {
+       XSync(dispPtr->display, False);
+    }
+#endif
 
     /*
      * Iterate destroying the displays until no more displays remain. It is
      * possible for displays to get recreated during exit by any code that
      * calls GetScreen, so we must destroy these new displays as well as the
---snip---

fvogel added on 2018-02-10 17:33:25:
Merged to core-8-6-branch and trunk.

fvogel added on 2018-01-28 19:23:57:
Could we agree on making a proactive move by merging the fix from Christian, despite perhaps not being the ultimate fix, considering that the originally reported issue would then be solved?

chw added on 2018-01-20 07:28:27:
Joe, thanks for your analysis. I tried the following experiment:

* removed my latest proposed change
* commented out the cleanup loop of the error handler list in TkCloseDisplay()
* added the cleanup loop of the error handler list to TkpCloseDisplay()
* moved the XSync() call in TkpCloseDisplay() before that cleanup

thus the last step before deleting the file handler of the display connection and XCloseDisplay() is the cleanup of the error handler list.

Bummer, I get the RenderBadPicture error again.

Although we finally should reorder things as you pointed out, the shakiness is IMO still the non-deterministic garbage collection which cleans up each ten Tk_DeleteErrorHandler() calls regardless of the lag between requests sent and known processed.

My proposed change however enforces an XSync() when the error handler to be deleted is known to have that lag and thus can still catch the X errors right before deletion.

My test environment is a modified tkinspect, the test case is simply closing the tkinspect window, which exits. You may find this on http://www.androwish.org

jenglish added on 2018-01-20 04:41:01:
It looks like this patch will suppressthe problem: calling XSync()
more often will reduce the chances that pending error replies remain
in the queue at process shutdown time.

The underlying bug however is still present.  Please see TkCloseDisplay()
in generic/tkWindow.c.  It clears the error handler list *before*
calling TkpCloseDisplay() (which in turn calls XSync() and XCloseDisplay()).

So if any earlier requests made within the scope of a Tk_CreateErrorHandler() / 
Tk_DeleteErrorHandler() block have generated an error which has not yet
been processed, XSync() receives the error and calls Tk's error handler,
but ErrorProc fails to handle it correctly because TkCloseDisplay has
already cleared the error handler list.

Reordering the cleanup sequence in TkCloseDisplay is probably the right fix.

I am not able to submit patches to fossil at this time, sorry.

fvogel added on 2018-01-19 20:56:07:
Looks like a very good idea, now committed in the bugfix branch, see [563b7081].

Joe, what do you think?

chw added on 2018-01-18 22:13:33:
After reading tkError.c more carefully I understand now jenglish's explanation.
There's a problem of the non-deterministic garbage collection of deleted error
handlers which might cause the sporadic X errors. Thus I propose to pay the
price for a round trip with the X server right before garbage collection, when
the error handler is known to be off of the last known processed request.
See the "syncwhendelerr.patch" which does a minimum of additional XSync() calls.
The added bonus is that we don't need to review all those Tk_CreateErrorHandler()
and Tk_DeleteErrorHandler() call sites, i.e. my first two patches aren't needed
any further.

chw added on 2018-01-17 23:03:22:
To be more precise, let's take XFreeColors() used by TkpFreeColor() enclosed in Tk_(Create|Delete)ErrorHandler(). The implementation in 2001 X11R6 is

  int
  XFreeColors(dpy, cmap, pixels, npixels, planes)
  register Display *dpy;
  Colormap cmap;
  unsigned long *pixels; /* LISTofCARD32 */
  int npixels;
  unsigned long planes; /* CARD32 */
  {
    register xFreeColorsReq *req;
    register long nbytes;

    LockDisplay(dpy);
    GetReq(FreeColors, req);
    req->cmap = cmap;
    req->planeMask = planes;

    /* on the VAX, each pixel is a 32-bit (unsigned) integer */
    req->length += npixels;

    nbytes = npixels << 2;              /* watch out for macros... */
    Data32 (dpy, (long *) pixels, nbytes);
    UnlockDisplay(dpy);
    SyncHandle();
    return 1;
  }

It makes and queues a protocol request, and finallydoes SyncHandle().
Normally, SyncHandle() is a noop except should we debug using
XSynchronize(..., True);

My (possibly wrong) analysis in this case is: Tk_(Create|Delete)ErrorHandler()
is some kind of placebo except the wish is run with -sync which turns the
SyncHandle() stuff on.

chw added on 2018-01-17 22:40:11:
> The patch is not necessary. Under normal circumstances Tk does the right thing
> inside Tk_CreateErrorHandler() ... Tk_DeleteErrorHandler() blocks, there is
> no need to call XSync(), and in fact doing so is likely to hurt performance.

This is my argument, what "normal circumstances" are. Normal in my understanding
is to be sure (in the ensured sense) that there's a round trip with the X server,
i.e. stuff including X errors flow back to the client *before* Tk_DeleteErrorHandler().

jenglish added on 2018-01-17 21:58:32:
> Also, could Joe English please review and provide his opinion on the patch? 

The patch is not necessary. Under normal circumstances Tk does the right thing
inside Tk_CreateErrorHandler() ... Tk_DeleteErrorHandler() blocks, there is
 no need to call XSync(), and in fact doing so is likely to hurt performance. 
(Also: my mistake, chw is correct, this logic is part of Tk, not Xlib.)

The cause of the problem is a bug in Tk's shutdown sequence.
From 2007-10-28 comment, the root cause:

> is because DeleteWindowsExitProc sets tsdPtr->displayList to NULL
> before calling TkCloseDisplay() on each member of the list.

The correct fix probably involves nulling out tsdPtr->displayList *after* 
calling TkCloseDisplay() instead of before; IIRC the only reason I didn't do 
that in 2007 is that the implications of making that change were unclear 
(Tk's shutdown sequence is rather complicated.)

chw added on 2018-01-17 10:27:57:
Maybe in the first place it is sufficient to add a remark in bold face font to the man page of Tk_CreateErrorHandler(), e.g.

When the code section enclosed by Tk_CreateErrorHandler() and Tk_DeleteErrorHandler() is known not to handle X replies/events/errors but sending X requests only, a call to XSync(display, False) should be added right before Tk_DeleteErrorHandler() in order to catch up the potential X errors caused by these request.

fvogel added on 2018-01-17 07:33:00:

I understand what you've done in this patch, thanks you for this! You looked at each Tk_DeleteErrorHandler() call, and checked if an XSync() is needed just before it or not, given the kind of access that is made to the X server between Tk_CreateErrorHandler() and Tk_DeleteErrorHandler().

This patch is now in branch bug-1821174fff, see [c1131971].

This looks OK to me, however I'm wondering: Shouldn't we add XSync() before each and every call to Tk_DeleteErrorHandler(), just for the sake of additional safety? Will we always think at this problem when Tk will evolve, and not forget later to add XSync() at places where it could become newly required? So shouldn't we, instead of spreading XSync() throughout the code, add it directly in Tk_DeleteErrorHandler() - that is, in a Unix-version of it, wrapping the real Tk_DeleteErrorHandler()? Or does this added XSync() bring performance issues or other undesirable effects that we would like to avoid when XSync is not absolutely needed? What do you think?

Also, could Joe English please review and provide his opinion on the patch? Thanks!


chw added on 2018-01-17 06:28:44:
Should my theory be correct, the attached "addxsync.patch" against core-8-6-branch might be useful. It has an XSync() before Tk_DeleteErrorHandler() at all places which I believe are request only, i.e. without XGetSomething() or similar involving reading the display connection for replies, events, and errors.

chw added on 2018-01-16 22:57:15:
jenglish is right regarding the logic in the Tk lib which associates the request numbers to an active Tk error handler. But Xlib (here's a snippet from 2001 X11R6) has no memory at all:

XErrorHandler
XSetErrorHandler(XErrorHandler handler)
{
    int (*oldhandler)(Display *dpy, XErrorEvent *event);

    _XLockMutex(_Xglobal_lock);
    oldhandler = _XErrorFunction;

    if (!oldhandler)
        oldhandler = _XDefaultError;

    if (handler != NULL) {
        _XErrorFunction = handler;
    }
    else {
        _XErrorFunction = _XDefaultError;
    }
    _XUnlockMutex(_Xglobal_lock);

    return (XErrorHandler) oldhandler;
}

chw added on 2018-01-16 22:53:15:
My point is about timing. If there's no read (events, replies, errors) from the display connection *before* the error handler is deleted the X error comes too late and is handled differently.

fvogel added on 2018-01-16 20:05:49:
OK so there is something I don't understand. If Joe is correct, why is Christian's change fixing the reported issue?

jenglish added on 2018-01-16 17:50:53:
> Given that XRequestSomething() does (like its name shall imply)
> no round trip with the X server but simply sends a request to the
> X server the error handler seems rather useless except if an
> XSync() had been placed right before removal of the error handler.

XLib remembers the error handler that was in effect whenever a request is made,
and calls it if that request generates an error response.

So given:

   Tk_CreateErrorHandler(... somehandler ...)
   XRequestSomething()
   Tk_DeleteErrorHandler()
   ...

'somehandler' will be called if XRequestSomething() generates an error,
even after Tk_DeleteErrorHandler() has been called.

chw added on 2018-01-15 17:50:06:
Francois, thanks for taking it into consideration. Now for the inconvenient consequences (I wish it were convenient inconsequences): There are some places in Tk where this pattern can be found:

  Tk_CreateErrorHandler()
  XRequestSomething()
  Tk_DeleteErrorHandler()

Given that XRequestSomething() does (like its name shall imply) no round trip with the X server but simply sends a request to the X server the error handler seems rather useless except if an XSync() had been placed right before removal of the error handler.

And yes, to test and to prove will be a nightmare of epic dimension.

fvogel added on 2018-01-14 14:58:35:

The fix looks OK to me given the analysis made ten years ago (!) by Joe English. I have put it in branch bug-1821174fff.

Thanks Christian!


chw added on 2018-01-13 18:22:30:
While playing with tkinspect using a remote X display through an ssh tunnel, I've observed this exact problem with core-8-6-branch, too.

The following patch seems to fix it:

------snip------
Index: unix/tkUnixRFont.c
==================================================================
--- unix/tkUnixRFont.c
+++ unix/tkUnixRFont.c
@@ -386,10 +386,20 @@
        XUnloadFont(fontPtr->display, fontPtr->font.fid);
     }
     if (fontPtr->fontset) {
        FcFontSetDestroy(fontPtr->fontset);
     }
+
+    /*
+     * Synchronize with X server before dropping the error handler.
+     * This seems to catch sporadic RenderBadPicture errors on tear
+     * down of an application.
+     *
+     * See bugs [1821174fffffffffffff] and [1938774fffffffffffff].
+     */
+    XSync(fontPtr->display, False);
+
     Tk_DeleteErrorHandler(handler);
 }
 

 TkFont *
 TkpGetNativeFont(

------snip------

jlec added on 2012-03-02 22:43:15:
Is there a fix now. I have a program which suffers from this and blocks the xft inclusion into tk.

jenglish added on 2008-04-12 03:52:08:
Logged In: YES 
user_id=68433
Originator: NO

> Is it understood why the following [...]

Yes: in that session, no font rendering occurs.

The RenderBadPicture error occurs when a Tk_Font is destroyed and (inhale) the Drawable associated with the Picture inside the XftDraw * belonging to the Tk_Font (exhale) has already been destroyed, thus invalidating the Picture.  The Tk_Font caches the last Drawable it was used on, to amortize the XftDrawChange() cost across multiple consecutive calls to Tk_DrawChars() with the same font and Drawable.

I suspect that the error is only generated if the Drawable is a Window, not when it's a Pixmap, since the error only appears to be replicable if there is a [menu] widget present.  (Most widgets draw into a background Pixmap, [menu] widgets draw directly onto their Window).

In any event, it's a problem with the shutdown order.  Removing [send] does not *cause* the problem; rather, it's that the presence of [send] changes the shutdown sequence so that the problems are masked.

dgp added on 2008-04-11 23:25:51:
Logged In: YES 
user_id=80530
Originator: NO


Is it understood why the following
session in a plain interactive
wish does not suffer the same problem?

$ wish
% rename send {}
% exit
$

jenglish added on 2007-11-02 04:59:56:
Logged In: YES 
user_id=68433
Originator: NO

Priority: 8; Severity: 1.

This is a real bug, and ought to be fixed before 8.5.0, but it's not severe: the worst end-user consequence is a confusing/annoying error message at program shutdown in (probably-) rare circumstances.

jenglish added on 2007-10-28 04:53:04:
Logged In: YES 
user_id=68433
Originator: NO

Origin of the RenderBadPicture error: XftDrawDestroy() from FinishedWithFont() from TkpDeleteFont(), called when the last reference to the font is removed (... in this case: Tk_FreeConfigOptions, called from a Destroy handler from Tk_DestroyWindow).   At this point the Picture ID in fontPtr->ftDraw will refer to the drawable specified in the most recent call to XftDrawChange(); which in turn could be a Window that's already been destroyed.  XRENDER docs are sketchy on this point -- it's not clear if destroying a Drawable invalidates Pictures that refer to the drawable, but it's a reasonable assumption.

Now the FinishedWithFont() procedure body is wrapped in a Tk_CreateErrorHandler() / Tk_DeleteErrorHandler() pair, so this error *should* be caught and trapped.  More printf-tracing indicates that this is the case in the non-error condition.  Stack trace for the error condition shows:

ErrorProc
_XError
_XReply
XSync
TkpCloseDisplay
TkCloseDisplay
DeleteWindowsExitProc

and single-stepping through ErrorProc (generic/tkError.c) indicates that the call to TkGetDisplay() returned NULL.  The reason it returns NULL is because DeleteWindowsExitProc sets tsdPtr->displayList to NULL before calling TkCloseDisplay() on each member of the list.

jenglish added on 2007-10-28 03:45:25:
Logged In: YES 
user_id=68433
Originator: NO

Re: what does [rename send {}] have to do with this:

When the ::send command is destroyed DeleteProc() in unix/tkUnixSend.c is called which (among other things) calls XSync().  The ::send command is normally destroyed as part of Tk's shutdown sequence; but if the application deletes it early, this won't happen at shutdown time.

Exact sequence of events still unclear.

jenglish added on 2007-10-28 01:59:13:
Logged In: YES 
user_id=68433
Originator: NO

| Tentative hypothesis: [...]

Nope, that does not appear to be the case; DisplayMenu is called at the right time.  Only other difference I can think of between menus and other widgets is that most widgets render into an offscreen pixmap, while menus render directly to their Window (#791527).

And what does [rename send {}] have to do with this?

jenglish added on 2007-10-28 00:51:27:
Logged In: YES 
user_id=68433
Originator: NO

Symptoms do not occur under "--disable-xft".

Problem seems to be limited to menus: replacing [. configure -menu $mb] with [pack [label .l -text "..."]] or some other widget, the symptom does not appear.

Tracing with printf()s indicates that MeasureChars (unixRFont.c) is being called sometime between the [destroy .] and the X error.  Tentative hypothesis: the menu widget is processing an <Expose> binding during or after application shutdown?

[email protected] added on 2007-10-27 23:57:54:
Logged In: NO 

Replicated with current CVS HEAD.

Shorter script that replicates the problem:

# ---- cut here ----
rename send {}
set mb [menu .menubar]
$mb add cascade -label "File" -menu [menu $mb.file]
$mb.file add command -label Exit -command [list destroy .]
. configure -menu $mb
tkwait visibility .
destroy .
# ---- end ----

Attachments: