Ticket UUID: | 92e614e612b94686a7252a30b39d7c5f1cbfa812 | |||
Title: | OS X - Redraw artifacts (ghosts) with text widget | |||
Type: | Bug | Version: | core-8-6-branch, revised_text | |
Submitter: | fvogel | Created on: | 2017-02-09 21:14:15 | |
Subsystem: | 66. Aqua Window Operations | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2017-11-07 20:58:43 | |
Resolution: | Fixed | Closed By: | fvogel | |
Closed on: | 2017-11-07 20:58:43 | |||
Description: |
On OS X only, there may still be redrawing artifacts (aka "ghosts") in the display of the text widget. To reproduce: package require Tk pack [text .t -width 15] .t insert end "Line 1\n\n" .t window create end -window [button .b -text "Button"] update | |||
User Comments: |
fvogel added on 2017-11-07 20:58:43:
Merged to core-8-6-branch and trunk. Many thanks to Marc Culler! fvogel added on 2017-11-01 19:57:39: Further work was done (better fix) by Marc Culler in branch OSX_redraw_artifacts. I have checked that in the current state of that branch, i.e. at [2174f54d], the issue reported in the present ticket still does not happen (checked on OS X of course, but also on Win and Linux), which means that this further work did not inadvertently break the original fix. I will merge the OSX_redraw_artifacts branch soon. marc_culler (claiming to be Marc Culler) added on 2017-10-17 18:32:13: Here is a small correction to my long post, pointed out by Francois: I should have said "The drawRgn is the intersection of the visRgn with the window's *clipping* rectangle." The clipping rectangle is usually equal to the window's bounding rectangle but a smaller clipping rectangle can be assigned by calling TkpClipDrawableToRect. marc_culler (claiming to be Marc Culler) added on 2017-10-15 18:39:14: I am adding this long comment to describe what was changed in the patch that I submitted and to explain the mechanism behind the problem of "ghost windows" which has plagued the macOS port of Tk for so long. (Spoiler alert: it is not the event loop.) First, let me mention that the drawing artifact referenced in this ticket was fixed by adding two lines to TkScrollWindow in tkMacOSXDraw.c. This also fixes the scrolling issues with the peer text widgets, where after a fast scroll some buttons would be displayed in the wrong place or, more precisely, in two places, although one of them might be outside the window. The patch also changes a couple of lines in XUnmapWindow in tkMacOSXSubwindows.c to fix a different drawing problem that has eluded me for years. Before the patch you could see this problem in the items.tcl demo by doing this: first scroll the Canvas to the bottom right corner, so there is a button, a text widget and a scale visible; then scroll very quickly to the left, say by clicking near the left end of the scrollbar. You would then see the three widgets still on the screen, superimposed over what was supposed to be there after the scroll. Now on to the the cause of the "ghost windows". I really do not think it is helpful to say, as gcramer does here, that "This is the well known problem with the event loop issues under OS X." It may be well known, but saying that it is a problem with the event loop is misleading and in particular can lead developers to look for the problem in the wrong place, as I have done many times. I will try to describe the real mechanism behind this problem. Part of Daniel Steffen's original design of the Mac port of Tk was that the TkWindowPrivate struct maintains three HIShapeRef regions, named visRgn, aboveVisRgn and drawRgn. These regions are used as clipping masks whenever drawing into an NSView. The visRgn is the bounding box of the window with a rectangle removed for each subwindow and for each sibling window at a higher z-level. The drawRgn is the intersection of the visRgn with the window's bounding rectangle. The aboveVisRgn is the intersection of the window's bounding rectangle with the bounding rectangle of the parent window. Much of the code in tkMacOSXSubindows.c is devoted to rebuilding these clipping regions whenever something changes in the layout of the windows. This turns out to be a tricky thing to do and it is extremely prone to errors which can be difficult to trace. I assume that the purpose for using these clipping regions was that if they are correctly maintained then it allows windows to be drawn in any order. You do not have to draw them in the order of the window hierarchy. Each window can draw its entire rectangle through its own mask and never have to worry about drawing in the wrong place. I can only guess about how the need for this arose, but here is my guess. Another part of the design was that Tcl uses the NSApplication object in an unusual way. It never calls the [NSApplication run] method, but instead it turns the NSApplication object into an "event source". As the Apple window manager adds events to the NSApplication event queue, Tcl extracts those events from the NSApplication event queue, converts them to Tcl events and adds them to the Tcl event queue. But Apple does not have expose events. Their replacement for an expose event is to have the window manager call the [NSView drawRect] method in any situation where an expose event for that NSView would be generated in X11. The [NSView drawRect] method is a no-op which is expected to be overridden by any application. In the case of Tcl, the replacement [NSView drawRect] method creates Tcl expose events which are added to the Tcl queue. It is possible that because the [NSView drawRect] can be called by the Apple window manager at unpredictable times, one cannot count on the expose events being generated in any specific order. So that is my guess for why the clipping regions are used. Thus it might be fair to say that the *reason* for the clipping regions had something to do with the event loop. But it is the clipping region implementation, and not the event loop implementation, that leads to the "ghost windows". And if you waste your time trying to control the timing of events or the process of transferring events from the apple queue to the Tcl queue you will not fix the problem, although you might change how the problem presents itself. To see how the 'ghost windows' can arise, think about what happens if the clipping regions are not maintained correctly. A window might have a rectangle missing from its clipping region because that rectangle is the bounding rectangle for a subwindow, say a button. The parent should not draw in the missing rectangle since doing so would trash the button. The button is responsible for drawing there. Now imagine that the button gets moved, say by a scroll, but the missing rectangle in the parent's clipping region does not get moved correctly, or it gets moved later on, after the parent has redrawn itself. The parent would still not be allowed to draw in the old rectangle, so the user would continue to see the image of the button in its old location, as well as another image in the new location. This is a prototypical example of a "ghost window". Anytime you see a "ghost window", you should suspect problems with the updates to the clipping region visRgn. It is natural to look for timing issues, race conditions, or other "event loop problems". But in fact, the whole design of the code is to make those timing issues irrelevant. As long as the clipping regions are correctly maintained the timing does not matter. And if they are not correctly maintained then you will see "ghost windows". Incidentally, at the time when Kevin and I first modified Steffen's code to remove calls to private Apple API functions, a button subwindow was implemented as a wrapper for an NSButton. I think this exacerbated the unpredictability of the calls to drawRect, as the Apple window manager would call the drawRect method for the button (which is not a no-op and actually draws the button) as well as for the parent. In our efforts to address this problem we redesigned the buttons so that they would be drawn directly by Tk instead of by the Apple window manager. This simplification seemed to make the problem of maintaining the clipping regions more manageable. But it should not have been necessary if the clipping region code had been correct. And it may be necessary to go back to using NSButton if Apple decides to remove the libraries which we used to draw buttons directly. The error in TkUnmapWindow was that when a subwindow was unmapped (which happens the instant that it is scrolled outside of its parent window) its bounding rectangle was not being immediately added back into the parent's clipping mask. The fix for this, however, was very subtle due to the complicated logic of the code which maintains the clipping masks. (Calls were being made to functions that update the clipping masks, but those functions were being confused by the fact that some of the subwindows had been unmapped.) The fix to TkScrollWindow was easier - it just needed to invalidate clipping masks in the usual way at the end of the scroll. fvogel added on 2017-10-15 13:56:17: Also, I have changed the "subsystem" field since this was no "well known problem with the event loop issues under OS X", contrary to what was stated in the comments below. fvogel added on 2017-10-15 13:53:08: Confirmed fixed (tested in OSX_redraw_artifacts) fvogel added on 2017-10-15 12:19:58: Marc Culler has now provided an additional patch that fixes this issue. See branch OSX_redraw_artifacts, already merged to core-8-6-branch, trunk, and revised_text. fvogel added on 2017-10-13 20:31:43: Marc Culler has provided a patch concerning redraw artifacts, that got committed to branch bug-fab5fed65e as well as core-8-6-branch and trunk. However, as exposed by Marc himself, his patch, whereas it should fix [fab5fed65e] and [40a9abb9db], is not supposed to fix the present ticket. fvogel added on 2017-08-19 08:43:14: Thanks for the comments and snapshots. I change the category of this ticket accordingly. I find especially interesting the observation that with 8.5.9 there are no issues. Regarding your comment: > reported the problems (and ignored by the TCT as usual) You have been explained already several times and in details that the TCT does not ignore anything. It's just that the TCT is a very small team with limited time, and regarding Tk in particular there are even less members of this TCT that maintain Tk. So I kindly ask you to stop propagating such grossly wrong statements. gcramer added on 2017-08-19 07:53:23:
This is the well known problem with the event loop issues under OS X. The revised text widget has not changed this behavior, but it's clear that you can find different test cases for the revised version and the legacy version. I'm referencing a comment from Steven Atkinson, the developer from Scid vs PC, he has some years experience with the text widget under Mac, and has sometimes reported the problems (and ignored by the TCT as usual), and he has not the privileges to post to the bug reports with his account: These are just null bug reports. All artifacts have multiple rendering issues on OS X with Cocoa. It is why it is unusable for me. When the processor is busy, artifacts are more common, but the CPULoad<->renderingBugsNumber is not directly related. Bottom line is the event loop is badly broken. Attached (pic 1) is the latest artifacts i noticed (Main board is not properly displayed). I have attached his screenshots. Finally, there is nothing to do from the text widget point of view. It should be clear that the Mac interface (macosx/) needs some rework. fvogel added on 2017-03-05 19:14:03: This issue is present with the revised text widget as well. fvogel added on 2017-02-09 21:42:28: Code flow is different on Win (for instance) and on OS X when executing the line that triggers the bug: .t insert 2.0 [string repeat {asdf } 15] Instrumenting the code one can see that: On Win: % .t insert 2.0 [string repeat {asdf } 15]TextChanged - scheduling DisplayText() ENTERING DisplayText() TkScrollWindow - x = 2, y = 34, w = 120, h = 26, dx = 0, dy = 64 DisplayText() - REDRAW_PENDING flag is CLEARED now LEAVING DisplayText() TextEventProc - Expose event - Now calling TkTextRedrawRegion x = 2, y = 34, w = 47, h = 26 TkTextRedrawRegion - Entering function TkTextRedrawRegion - scheduling DisplayText() TkTextRedrawRegion - Leaving function ENTERING DisplayText() DisplayText() - REDRAW_PENDING flag is CLEARED now LEAVING DisplayText() On OS X: % .t insert 2.0 [string repeat {asdf } 15]TextChanged - scheduling DisplayText() ENTERING DisplayText() TkScrollWindow - x = 4, y = 34, w = 105, h = 22, dx = 0, dy = 60 TextEventProc - Expose event - Now calling TkTextRedrawRegion x = 4, y = 34, w = 105, h = 22 TkTextRedrawRegion - Entering function TkTextRedrawRegion - Leaving function DisplayText() - REDRAW_PENDING flag is CLEARED now LEAVING DisplayText() Observations: On OS X the TextEventProc() is called from within TkScrollWindow(), to process the Expose event immediately. This call to TextEventProc() has the effect of calling TkTextRedrawRegion() while the REDRAW_PENDING flag has not yet been cleared in DisplayText() (this clearing only happens after the call to TkScrollWindow()). This prevents from scheduling the second DisplayText() that can be observed on Windows. More digging shows that on OS X the Expose event generated by GenerateUpdates() in tkMacOSXWindowEvent.c is NOT queued (contrary to what the comments there says), but is dealt with immediately through a call to Tk_HandleEvent(). This was changed by commit [35a5490f]. However reverting back: Tk_HandleEvent(&event);by: Tk_QueueWindowEvent(&event, TCL_QUEUE_TAIL);does not appear to fix the redraw issue, even if the code execution flow then becomes the same as on Windows. |
Attachments:
- Screen Shot 2017-08-19 at 7.54.01 AM.png [download] added by gcramer on 2017-08-19 07:55:04. [details]
- Screen Shot 2017-08-19 at 7.09.19 AM.png [download] added by gcramer on 2017-08-19 07:54:45. [details]
- Screen Shot 2017-08-18 at 11.44.51 PM.png [download] added by gcramer on 2017-08-19 07:54:26. [details]