Tk Source Code

View Ticket
Login
Ticket UUID: 92e614e612b94686a7252a30b39d7c5f1cbfa812
Title: OS X - Redraw artifacts (ghosts) with text widget
Type: Bug Version: core-8-6-branch, revised_text
Submitter: fvogel Created on: 2017-02-09 21:14:15
Subsystem: 66. Aqua Window Operations Assigned To: nobody
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2017-11-07 20:58:43
Resolution: Fixed Closed By: fvogel
    Closed on: 2017-11-07 20:58:43
Description:

On OS X only, there may still be redrawing artifacts (aka "ghosts") in the display of the text widget.

To reproduce:

package require Tk
pack [text .t -width 15]
.t insert end "Line 1\n\n"
.t window create end -window [button .b -text "Button"]
update

.t insert 2.0 [string repeat {asdf } 15]

User Comments: fvogel added on 2017-11-07 20:58:43:
Merged to core-8-6-branch and trunk. Many thanks to Marc Culler!

fvogel added on 2017-11-01 19:57:39:

Further work was done (better fix) by Marc Culler in branch OSX_redraw_artifacts.

I have checked that in the current state of that branch, i.e. at [2174f54d], the issue reported in the present ticket still does not happen (checked on OS X of course, but also on Win and Linux), which means that this further work did not inadvertently break the original fix.

I will merge the OSX_redraw_artifacts branch soon.


marc_culler (claiming to be Marc Culler) added on 2017-10-17 18:32:13:
Here is a small correction to my long post, pointed out by Francois:

I should have said "The drawRgn is the intersection of the visRgn with the
window's *clipping* rectangle."

The clipping rectangle is usually equal to the window's bounding rectangle but
a smaller clipping rectangle can be assigned by calling TkpClipDrawableToRect.

marc_culler (claiming to be Marc Culler) added on 2017-10-15 18:39:14:
I am adding this long comment to describe what was changed in the patch that I
submitted and to explain the mechanism behind the problem of "ghost windows"
which has plagued the macOS port of Tk for so long. (Spoiler alert: it is not
the event loop.)

First, let me mention that the drawing artifact referenced in this ticket was
fixed by adding two lines to TkScrollWindow in tkMacOSXDraw.c.  This also fixes
the scrolling issues with the peer text widgets, where after a fast scroll some
buttons would be displayed in the wrong place or, more precisely, in two places,
although one of them might be outside the window.  The patch also changes a
couple of lines in XUnmapWindow in tkMacOSXSubwindows.c to fix a different
drawing problem that has eluded me for years.  Before the patch you could see
this problem in the items.tcl demo by doing this: first scroll the Canvas to
the bottom right corner, so there is a button, a text widget and a scale
visible; then scroll very quickly to the left, say by clicking near the left
end of the scrollbar.  You would then see the three widgets still on the screen,
superimposed over what was supposed to be there after the scroll.

Now on to the the cause of the "ghost windows".  I really do not think it is
helpful to say, as gcramer does here, that "This is the well known problem with
the event loop issues under OS X."  It may be well known, but saying that it is
a problem with the event loop is misleading and in particular can lead
developers to look for the problem in the wrong place, as I have done many
times.  I will try to describe the real mechanism behind this problem.

Part of Daniel Steffen's original design of the Mac port of Tk was that the
TkWindowPrivate struct maintains three HIShapeRef regions, named visRgn,
aboveVisRgn and drawRgn.  These regions are used as clipping masks whenever
drawing into an NSView.  The visRgn is the bounding box of the window with a
rectangle removed for each subwindow and for each sibling window at a higher
z-level.  The drawRgn is the intersection of the visRgn with the window's
bounding rectangle.  The aboveVisRgn is the intersection of the window's
bounding rectangle with the bounding rectangle of the parent window.  Much of
the code in tkMacOSXSubindows.c is devoted to rebuilding these clipping regions
whenever something changes in the layout of the windows.  This turns out to be
a tricky thing to do and it is extremely prone to errors which can be difficult
to trace.

I assume that the purpose for using these clipping regions was that if they are
correctly maintained then it allows windows to be drawn in any order.  You do
not have to draw them in the order of the window hierarchy.  Each window can
draw its entire rectangle through its own mask and never have to worry about
drawing in the wrong place.  I can only guess about how the need for this arose,
but here is my guess.  Another part of the design was that Tcl uses the
NSApplication object in an unusual way.  It never calls the [NSApplication run]
method, but instead it turns the NSApplication object into an "event source".
As the Apple window manager adds events to the NSApplication event queue, Tcl
extracts those events from the NSApplication event queue, converts them to Tcl
events and adds them to the Tcl event queue.  But Apple does not have expose
events.  Their replacement for an expose event is to have the window manager
call the [NSView drawRect] method in any situation where an expose event for
that NSView would be generated in X11.  The [NSView drawRect] method is a no-op
which is expected to be overridden by any application.  In the case of Tcl,
the replacement [NSView drawRect] method creates Tcl expose events which are
added to the Tcl queue.  It is possible that because the [NSView drawRect]
can be called by the Apple window manager at unpredictable times, one cannot
count on the expose events being generated in any specific order.  So that is
my guess for why the clipping regions are used.  Thus it might be fair to say
that the *reason* for the clipping regions had something to do with the event
loop.  But it is the clipping region implementation, and not the event loop
implementation, that leads to the "ghost windows".  And if you waste your time
trying to control the timing of events or the process of transferring events
from the apple queue to the Tcl queue you will not fix the problem, although
you might change how the problem presents itself.

To see how the 'ghost windows' can arise, think about what happens if the
clipping regions are not maintained correctly.  A window might have a rectangle
missing from its clipping region because that rectangle is the bounding
rectangle for a subwindow, say a button.  The parent should not draw in the
missing rectangle since doing so would trash the button.  The button is
responsible for drawing there. Now imagine that the button gets moved, say by a
scroll, but the missing rectangle in the parent's clipping region does not get
moved correctly, or it gets moved later on, after the parent has redrawn itself.
The parent would still not be allowed to draw in the old rectangle, so the user
would continue to see the image of the button in its old location, as well as
another image in the new location.  This is a prototypical example of a "ghost
window".  Anytime you see a "ghost window", you should suspect problems with the
updates to the clipping region visRgn.  It is natural to look for timing issues,
race conditions, or other "event loop problems".  But in fact, the whole design
of the code is to make those timing issues irrelevant.  As long as the clipping
regions are correctly maintained the timing does not matter.  And if they are
not correctly maintained then you will see "ghost windows".

Incidentally, at the time when Kevin and I first modified Steffen's code to
remove calls to private Apple API functions, a button subwindow was implemented
as a wrapper for an NSButton.  I think this exacerbated the unpredictability of
the calls to drawRect, as the Apple window manager would call the drawRect
method for the button (which is not a no-op and actually draws the button) as
well as for the parent.  In our efforts to address this problem we redesigned
the buttons so that they would be drawn directly by Tk instead of by the Apple
window manager.  This simplification seemed to make the problem of maintaining
the clipping regions more manageable.  But it should not have been necessary if
the clipping region code had been correct.  And it may be necessary to go back
to using NSButton if Apple decides to remove the libraries which we used to draw
buttons directly.

The error in TkUnmapWindow was that when a subwindow was unmapped (which
happens the instant that it is scrolled outside of its parent window) its
bounding rectangle was not being immediately added back into the parent's
clipping mask.  The fix for this, however, was very subtle due to the
complicated logic of the code which maintains the clipping masks.  (Calls were
being made to functions that update the clipping masks, but those functions
were being confused by the fact that some of the subwindows had been unmapped.)
The fix to TkScrollWindow was easier - it just needed to invalidate clipping
masks in the usual way at the end of the scroll.

fvogel added on 2017-10-15 13:56:17:
Also, I have changed the "subsystem" field since this was no "well known problem with the event loop issues under OS X", contrary to what was stated in the comments below.

fvogel added on 2017-10-15 13:53:08:

Confirmed fixed (tested in OSX_redraw_artifacts)


fvogel added on 2017-10-15 12:19:58:

Marc Culler has now provided an additional patch that fixes this issue.

See branch OSX_redraw_artifacts, already merged to core-8-6-branch, trunk, and revised_text.


fvogel added on 2017-10-13 20:31:43:

Marc Culler has provided a patch concerning redraw artifacts, that got committed to branch bug-fab5fed65e as well as core-8-6-branch and trunk.

However, as exposed by Marc himself, his patch, whereas it should fix [fab5fed65e] and [40a9abb9db], is not supposed to fix the present ticket.


fvogel added on 2017-08-19 08:43:14:
Thanks for the comments and snapshots. I change the category of this ticket accordingly. I find especially interesting the observation that with 8.5.9 there are no issues.

Regarding your comment:
> reported the problems (and ignored by the TCT as usual)

You have been explained already several times and in details that the TCT does not ignore anything. It's just that the TCT is a very small team with limited time, and regarding Tk in particular there are even less members of this TCT that maintain Tk. So I kindly ask you to stop propagating such grossly wrong statements.

gcramer added on 2017-08-19 07:53:23:

This is the well known problem with the event loop issues under OS X. The revised text widget has not changed this behavior, but it's clear that you can find different test cases for the revised version and the legacy version.

I'm referencing a comment from Steven Atkinson, the developer from Scid vs PC, he has some years experience with the text widget under Mac, and has sometimes reported the problems (and ignored by the TCT as usual), and he has not the privileges to post to the bug reports with his account:

These are just null bug reports. All artifacts have multiple rendering issues on OS X with Cocoa. It is why it is unusable for me. When the processor is busy, artifacts are more common, but the CPULoad<->renderingBugsNumber is not directly related. Bottom line is the event loop is badly broken. Attached (pic 1) is the latest artifacts i noticed (Main board is not properly displayed).
The peer widgets issue is related. Whole widgets inside SCID's paned windows are not drawn. Pic 2 shows a new comment editor window which should be drawn perfectly, but is totaly missing, simliar to one of the bug reports, but has many included widgets, including 10 buttons and a few frames, all of which are not shown.
Pic 3 shows how the comment editor should look, but it also shows a totally broken PGN window.The top half has not be redrawn since switching panes from the gamelist.
There does seem to be different rendering anomalies with the revised text widget installed, but IMO this is just a consequence of it running faster, and event loop flow changing. None of these abberations are with Carbon wish 8.5.9.
For example, with revised tktext installed, the buttons on the engine window render much better.

I have attached his screenshots.

Finally, there is nothing to do from the text widget point of view. It should be clear that the Mac interface (macosx/) needs some rework.


fvogel added on 2017-03-05 19:14:03:
This issue is present with the revised text widget as well.

fvogel added on 2017-02-09 21:42:28:

Code flow is different on Win (for instance) and on OS X when executing the line that triggers the bug:

  .t insert 2.0 [string repeat {asdf } 15]

Instrumenting the code one can see that:

On Win:

    % .t insert 2.0 [string repeat {asdf } 15]

TextChanged - scheduling DisplayText() ENTERING DisplayText() TkScrollWindow - x = 2, y = 34, w = 120, h = 26, dx = 0, dy = 64 DisplayText() - REDRAW_PENDING flag is CLEARED now LEAVING DisplayText() TextEventProc - Expose event - Now calling TkTextRedrawRegion x = 2, y = 34, w = 47, h = 26 TkTextRedrawRegion - Entering function TkTextRedrawRegion - scheduling DisplayText() TkTextRedrawRegion - Leaving function ENTERING DisplayText() DisplayText() - REDRAW_PENDING flag is CLEARED now LEAVING DisplayText()

On OS X:

    % .t insert 2.0 [string repeat {asdf } 15]

TextChanged - scheduling DisplayText() ENTERING DisplayText() TkScrollWindow - x = 4, y = 34, w = 105, h = 22, dx = 0, dy = 60 TextEventProc - Expose event - Now calling TkTextRedrawRegion x = 4, y = 34, w = 105, h = 22 TkTextRedrawRegion - Entering function TkTextRedrawRegion - Leaving function DisplayText() - REDRAW_PENDING flag is CLEARED now LEAVING DisplayText()

Observations:

On OS X the TextEventProc() is called from within TkScrollWindow(), to process the Expose event immediately. This call to TextEventProc() has the effect of calling TkTextRedrawRegion() while the REDRAW_PENDING flag has not yet been cleared in DisplayText() (this clearing only happens after the call to TkScrollWindow()). This prevents from scheduling the second DisplayText() that can be observed on Windows.

More digging shows that on OS X the Expose event generated by GenerateUpdates() in tkMacOSXWindowEvent.c is NOT queued (contrary to what the comments there says), but is dealt with immediately through a call to Tk_HandleEvent(). This was changed by commit [35a5490f]. However reverting back:

  Tk_HandleEvent(&event);
by:
  Tk_QueueWindowEvent(&event, TCL_QUEUE_TAIL);
does not appear to fix the redraw issue, even if the code execution flow then becomes the same as on Windows.


Attachments: