CSS Anchor Positioning is a novel CSS specification module
that allows positioned elements to size and position themselves relative to one or more anchor elements anywhere on the web page.
In simpler terms, it is a new web platform API that simplifies advanced relative-positioning scenarios such as tooltips, menus, popups, etc.
To better understand the true power it brings, let’s consider a non-trivial layout presented in Figure 1:
In the past, creating a context menu with position: fixed and positioned relative to the button required doing positioning-related calculations manually.
The more complex the layout, the more complex the situation. For example, if the table in the above example was in a scrollable container,
the position of the context menu would have to be updated manually on every scroll event.
With the CSS Anchor Positioning the solution to the above problem becomes trivial and requires 2 parts:
The <button> element must be marked as an anchor element by adding anchor-name: --some-name.
The context menu element must position itself using the anchor() function: left: anchor(--some-name right); top: anchor(--some-name bottom).
The above is enough for the web engine to understand that the context menu element’s left and top must be positioned to the anchor element’s right and bottom.
With that, the web engine can carry out the job under the hood, so the result is as in Figure 2:
As the above demonstrates, even with a few simple API pieces, it’s now possible to address very complex scenarios in a very elegant fashion from the web developer’s perspective.
Moreover, CSS Anchor Positioning offers even more than that. There are numerous articles with great examples such as
this MDN article,
this css-tricks article,
or this chrome blog post, but the long story short is that
both positioning and sizing elements relative to anchors are now very simple.
The first draft of the specification was published in early 2023,
which in the web engines field is not so long time ago.
Therefore - as one can imagine - not all the major web engines support it yet. The first (and so far the only) web engine
to support CSS Anchor Positioning was Chromium (see the introduction blog post) -
thus the information on caniuse.com.
However, despite the information visible on the WPT results page,
the other web engines are currently implementing it (see the meta bug for Gecko and
bug list
for WebKit). The lack of progress on the WPT results page is due to the feature not being enabled by default yet in those cases.
From the commits visible publicly, one can deduce that the work on CSS Anchor Positioning in WebKit has been started by Apple early 2024.
The implementation was initiated by adding a core part - support for anchor-name, position-anchor, and anchor(). Those 2 properties and function are enough to start using the feature
in real-world scenarios as well as more sophisticated WPT tests.
The work on the above had been finished by the end of Q3 2024, and then - in Q4 2024 - the work significantly intensified. A parsing/computing support has been added for numerous
properties and functions and moreover, a lot of new functionalities and bug fixes landed afterwards. One could expect some more things to land by the end of the year even if there’s
not much time left.
Overall, the implementation is in progress and is far from being done, but can already be tested in many real-world scenarios.
This can be done using custom WebKit builds (across various OSes) or using Safari Technology Preview on Mac.
The precondition for testing is, however, that the runtime preference called CSSAnchorPositioning is enabled.
Since the CSS Anchor Positioning in WebKit is still work in progress, and since the demand for the set of features this module brings is high, I’ve been privileged to contribute
a little to the implementation myself. My work so far has been focused around the parts of API that allow creating menu-like elements becoming visible on demand.
The first challenge with the above was to fix various problems related to toggling visibility status such as:
The obvious first step towards addressing the above was to isolate elegant scenarios to reproduce the above. In the process, I’ve created some test cases, and added them to WPT.
With tests in place, I’ve imported them into WebKit’s source tree and proceeded with actual bug fixing.
The result was the fix for the above crash, and the fix for the layout being broken.
With that in place, the visibility of menu-like elements can be changed without any problems now.
The second challenge was about the missing features allowing automatic alignment to the anchor. In a nutshell, to get the alignment like in the Figure 3:
there are 2 possibilities:
The position-area CSS property can be used: position-area: bottom center;.
At first, I wasn’t aware of the anchor-center and hence I’ve started initial work towards supporting position-area.
Once I became aware, however, I’ve switched my focus to implementing anchor-center and left the above for Apple to continue - not to block them.
Until now, both the initial and core parts of anchor-center implementation have landed.
It means, the basic support is in place.
Despite anchor-center layout tests passing, I’ve already discovered some problems such as:
and I anticipate more problems may appear once the testing intensifies.
To address the above, I’ll be focusing on adding extra WPT coverage along with fixing the problems one by one. The key is to
make sure that at the end of the day, all the unexpected problems are covered with WPT test cases. This way, other web engines
will also benefit.
With WebKit’s implementation of CSS Anchor Positioning in its current shape, the work can be very much parallel. Assuming that Apple will keep
working on that at the same pace as they did for the past few months, I wouldn’t be surprised if CSS Anchor Positioning would be pretty much
done by the end of 2025. If the implementation in Gecko doesn’t stall, I think one can also expect a lot of activity around testing in the
WPT. With that, the quality of implementation across the web engines should improve, and eventually (perhaps in 2026?) the CSS Anchor Positioning
should reach the state of full interoperability.
It’s been more than 2 years since the last time I wrote something here, and in that time a lot of things happened. Among those, one of the main highlights was me moving back to Igalia‘s WebKit team, but this time I moved as part of Igalia’s support infrastructure to help with other types of tasks such as general coordination, team facilitation and project management, among other things.
On top of those things, I’ve been also presenting our work around WebKit in different venues, such as in the Embedded Open Source Summit or in the Embedded Recipes conference, for instance. Of course, that included presenting our work in the WebKit community as part of the WebKit Contributors Meeting, a small and technically focused event that happens every year, normally around the Bay Area (California). That’s often a pretty dense presentation where, over the course of 30-40 minutes, we go through all the main areas that we at Igalia contribute to in WebKit, trying to summarize our main contributions in the previous 12 months. This includes work not just from the WebKit team, but also from other ones such as our Web Platform, Compilers or Multimedia teams.
This is a long read, so maybe grab a cup of your favorite beverage first…
Igalia and WebKit
So first of all, what is the relationship between Igalia and the WebKit project?
In a nutshell, we are the lead developers and the maintainers of the two Linux-based WebKit ports, known as WebKitGTK and WPE. These ports share a common baseline (e.g. GLib, GStreamer, libsoup) and also some goals (e.g. performance, security), but other than that their purpose is different, with WebKitGTK being aimed at the Linux desktop, while WPE is mainly focused on embedded devices.
This means that, while WebKitGTK is the go-to solution to embed Web content in GTK applications (e.g. GNOME Web/Epiphany, Evolution), and therefore integrates well with that graphical toolkit, WPE does not even provide a graphical toolkit since its main goal is to be able to run well on embedded devices that often don’t even have a lot of memory or processing power, or not even the usual mechanisms for I/O that we are used to in desktop computers. This is why WPE’s architecture is designed with flexibility in mind with a backends-based architecture, why it aims for using as few resources as possible, and why it tries to depend on as few libraries as possible, so you can integrate it virtually in any kind of embedded Linux platform.
Besides that port-specific work, which is what our WebKit and Multimedia teams focus a lot of their effort on, we also contribute at a different level in the port-agnostic parts of WebKit, mostly around the area of Web standards (e.g. contributing to Web specifications and to implement them) and the Javascript engine. This work is carried out by our Web Platform and Compilers team, which tirelessly contribute to the different parts of WebCore and JavaScriptCore that affect not just the WebKitGTK and WPE ports, but also the rest of them to a bigger or smaller degree.
Last but not least, we also devote a considerable amount of our time to other topics such as accessibility, performance, bug fixing, QA... and also to make sure WebKit works well on 32-bit devices, which is an important thing for a lot of WPE users out there.
Who are our users?
At Igalia we distinguish 4 main types of users of the WebKitGTK and WPE ports of WebKit:
Port users: this category would include anyone that writes a product directly against the port’s API, that is, apps such as a desktop Web browser or embedded systems that rely on a fullscreen Web view to render its Web-based content (e.g. digital signage systems).
Platform providers: in this category we would have developers that build frameworks with one of the Linux ports at its core, so that people relying on such frameworks can leverage the power of the Web without having to directly interface with the port’s API. RDK could be a good example of this use case, with WPE at the core of the so-called Thunder plugin (previously known as WPEFramework).
Web developers: of course, Web developers willing to develop and test their applications against our ports need to be considered here too, as they come with a different set of needs that need to be fulfilled, beyond rendering their Web content (e.g. using the Web Inspector).
End users: And finally, the end user is the last piece of the puzzle we need to pay attention to, as that’s what makes all this effort a task worth undertaking, even if most of them most likely don’t need what WebKit is, which is perfectly fine :-)
We like to make this distinction of 4 possible types of users explicit because we think it’s important to understand the complexity of the amount of use cases and the diversity of potential users and customers we need to provide service for, which is behind our decisions and the way we prioritize our work.
Strategic goals
Our main goal is that our product, the WebKit web engine, is useful for more and more people in different situations. Because of this, it is important that the platform is homogeneous and that it can be used reliably with all the engines available nowadays, and this is why compatibility and interoperability is a must, and why we work with the the standards bodies to help with the design and implementation of several Web specifications.
With WPE, it is very important to be able to run the engine in small embedded devices, and that requires good performance and being efficient in multiple hardware architectures, as well as great flexibility for specific hardware, which is why we provided WPE with a backend-based architecture, and reduced dependencies to a minimum.
Then, it is also important that the QA Infrastructure is good enough to keep the releases working and with good quality, which is why I regularly maintain, evolve and keep an eye on the EWS and post-commit bots that keep WebKitGTK and WPE building, running and passing the tens of thousands of tests that we need to check continuously, to ensure we don’t regress (or that we catch issues soon enough, when there’s a problem). Then of course it’s also important to keep doing security releases, making sure that we release stable versions with fixes to the different CVEs reported as soon as possible.
Finally, we also make sure that we keep evolving our tooling as much as possible (see for instance the release of the new SDK earlier this year), as well as improving the documentation for both ports.
Last, all this effort would not be possible if not because we also consider a goal of us to maintain an efficient collaboration with the rest of the WebKit community in different ways, from making sure we re-use and contribute to other ports as much code as possible, to making sure we communicate well in all the forums available (e.g. Slack, mailing list, annual meeting).
Contributions to WebKit in numbers
Well, first of all the usual disclaimer: number of commits is for sure not the best possible metric, and therefore should be taken with a grain of salt. However, the point here is not to focus too much on the actual numbers but on the more general conclusions that can be extracted from them, and from that point of view I believe it’s interesting to take a look at this data at least once a year.
With that out of the way, it’s interesting to confirm that once again we are still the 2nd biggest contributor to WebKit after Apple, with ~13% of the commits landed in this past 12-month period. More specifically, we landed 2027 patches out of the 15617 ones that took place during the past year, only surpassed by Apple and their 12456 commits. The remaining 1134 patches were landed mostly by Sony, followed by RedHat and several other contributors.
Now, if we remove Apple from the picture, we can observe how this year our contributions represented ~64% of all the non-Apple commits, a figure that grew about ~11% compared to the past year. This confirms once again our commitment to WebKit, a project we started contributing about 14 years ago already, and where we have been systematically being the 2nd top contributor for a while now.
Main areas of work
The 10 main areas we have contributed to in WebKit in the past 12 months are the following ones:
Web platform
Graphics
Multimedia
JavaScriptCore
New WPE API
WebKit on Android
Quality assurance
Security
Tooling
Documentation
In the next sections I’ll talk a bit about what we’ve done and what we’re planning to do next for each of them.
Web Platform
content-visibility:auto
This feature allows skipping painting and rendering of off-screen sections, particularly useful to avoid the browser spending time rendering parts in large pages, as content outside of the view doesn’t get rendered until it gets visible.
We completed the implementation and it’s now enabled by default.
Navigation API
This is a new API to manage browser navigation actions and examine history, which we started working on in the past cycle. There’s been a lot of work happening here and, while it’s not finished yet, the current plan is that Apple will continue working on that in the next months.
hasUAVisualTransition
This is an attribute of the NavigateEvent interface, which is meant to be True if the User Agent has performed a visual transition before a navigation event. It was something that we have also finished implementing and is now also enabled by default.
On top of that we also moved the X25519 feature to the “prepare to ship” stage.
Trusted Types
This work is related to reducing DOM-based XSS attacks. Here we finished the implementation and this is now pending to be enabled by default.
MathML
We continued working on the MathML specification by working on the support for padding, border and margin, as well as by increasing the WPT score by ~5%.
The plan for next year is to continue working on core features and improve the interaction with CSS.
Cross-root ARIA
Web components have accessibility-related issues with native Shadow DOM as you cannot reference elements with ARIA attributes across boundaries. We haven’t worked on this in this period, but the plan is to work in the next months on implementing the Reference Target proposal to solve those issues.
Canvas Formatted Text
Canvas has not a solution to add formatted and multi-line text, so we would like to also work on exploring and prototyping the Canvas Place Element proposal in WebKit, which allows better text in canvas and more extended features.
Graphics
Completed migration from Cairo to Skia for the Linux ports
The results in the end were pretty overwhelming and we decided to give Skia a go, and we are happy to say that, as of today, the migration has been completed: we covered all the use cases in Cairo, achieving feature parity, and we are now working on implementing new features and improvements built on top of Skia (e.g. GPU-based 2D rendering).
On top of that, Skia is now the default backend for WebKitGTK and WPE since 2.46.0, released on September 17th, so if you’re building a recent version of those ports you’ll be already using Skia as their 2D rendering backend. Note that Skia is using its GPU-based backend only on desktop environments, on embedded devices the situation is trickier and for now the default is the CPU-based Skia backend, but we are actively working to narrow the gap and to enable GPU-based rendering also on embedded.
Architecture changes with buffer sharing APIs (DMABuf)
We did a lot of work here, such as a big refactoring of the fencing system to control the access to the buffers, or the continued work towards integrating with Apple’s DisplayLink infrastructure.
On top of that, we also enabled more efficient composition using damaging information, so that we don’t need to pass that much information to the compositor, which would slow the CPU down.
Enablement of the GPUProcess
On this front, we enabled by default the compilation for WebGL rendering using the GPU process, and we are currently working in performance review and enabling it for other types of rendering.
New SVG engine (LBSE: Layer-Based SVG Engine)
If you are not familiar with this, here the idea is to make sure that we reuse the graphics pipeline used for HTML and CSS rendering, and use it also for SVG, instead of having its own pipeline. This means, among other things, that SVG layers will be supported as a 1st-class citizen in the engine, enabling HW-accelerated animations, as well as support for 3D transformations for individual SVG elements.
On this front, on this cycle we added support for the missing features in the LBSE, namely:
Implemented support for gradients & patterns (applicable to both fill and stroke)
Implemented support for clipping & masking (for all shapes/text)
Implemented support for markers
Helped review implementation of SVG filters (done by Apple)
Besides all this, we also improved the performance of the new layer-based engine by reducing repaints and re-layouts as much as possible (further optimizations still possible), narrowing the performance gap with the current engine for MotionMark. While we are still not at the same level of performance as the current SVG engine, we are confident that there are several key places where, with the right funding, we should be able to improve the performance to at least match the current engine, and therefore be able to push the new engine through the finish line.
General overhaul of the graphics pipeline, touching different areas (WIP):
On top of everything else commented above, we also worked on a general refactor and simplification of the graphics pipeline. For instance, we have been working on the removal of the Nicosia layer now that we are not planning to have multiple rendering implementations, among other things.
Multimedia
DMABuf-based sink for HW-accelerated video
We merged the DMABuf-based sink for HW-accelerated video in the GL-based GStreamer sink.
WebCodecs backend
We completed the implementation of audio/video encoding and decoding, and this is now enabled by default in 2.46. As for the next steps, we plan to keep working on the integration of WebCodecs with WebGL and WebAudio.
GStreamer-based WebRTC backends
We continued working on GstWebRTC, bringing it to a point where it can be used in production in some specific use cases, and we will still be working on this in the next months.
Other
Besides the points above, we also added an optional text-to-speech backend based on libspiel to the development branch, and worked on general maintenance around the support for Media Source Extensions (MSE) and Encrypted Media Extensions (EME), which are crucial for the use case of WPE running in set-top-boxes, and is a permanent task we will continue to work on in the next months.
JavaScriptCore
ARMv7/32-bit support:
A lot of work happened around 32-bit support in JavaScriptCore, especially around WebAssembly (WASM): we ported the WASM BBQJIT and ported/enabled concurrent JIT support, and we also completed 80% of the implementation for the OMG optimization level of WASM, which we plan to finish in the next months. If you are unfamiliar with what the OMG and BBQ optimization tiers in WASM are, I’d recommend you to take a look at this article in webkit.org: “Assembling WebAssembly“.
We also contributed to the JIT-less WASM, which is very useful for embedded systems that can’t support JIT for security or memory related constraints, and also did some work on the In-Place Interpreter (IPInt), which is a new version of the WASM Low-level interpreter (LLInt) that uses less memory and executes WASM bytecode directly without translating it to LLInt bytecode (and should therefore be faster to execute).
Last, we also contributed most of the implementation for the WASM GC, with the exception of some Kotlin tests.
As for the next few months, we plan to investigate and optimize heap/JIT memory usage in 32-bit, as well as to finish several other improvements on ARMv7 (e.g. IPInt).
New WPE API
The new WPE API is a new API that aims at making it easier to use WPE in embedded devices, by removing the hassle of having to handle several libraries in tandem (i.e. WPEWebKit, libWPE and WPEBackend-FDO, for instance), available from WPE’s releases page, and providing a more modern API in general, better aimed at the most common use cases of WPE.
A lot of effort happened this year along these lines, including the fact that we finally upstreamed and shipped its initial implementation with WPE 2.44, back in the first half of the year. Now, while we recommend users to give it a try and report feedback as much as possible, this new API is still not set in stone, with regular development still ongoing, so if you have the chance to try it out and share your experience, comments are welcome!
Besides shipping its initial implementation, we also added support for external platforms, so that other ones can be loaded beyond the Wayland, DRM and “headless” ones, which are the default platforms already included with WPE itself. This means for instance that a GTK4 platform, or another one for RDK could be easily used with WPE.
Then of course a lot of API additions were included in the new API in the latest months:
Screens management API: API to handle different screens, ask the display for the list of screens with their device scale factor, refresh rate, geometry…
Top level management API: This API allows a greater degree of control, for instance by allowing more than one WebView for the same top level, as well as allowing to retrieve properties such as size, scale or state (i.e. full screen, maximized…).
Maximized and minimized windows API: API to maximize/minimize a top level and monitor its state. mainly used by WebDriver.
Preferred DMA-BUF formats API: enables asking the platform (compositor or DRM) for the list of preferred formats and their intended use (scanout/rendering).
Input methods API: allows platforms to provide an implementation to handle input events (e.g. virtual keyboard, autocompletion, auto correction…).
Gestures API: API to handle gestures (e.g. tap, drag).
Buffer damaging: WebKit generates information about the areas of the buffer that actually changed and we pass that to DRM or the compositor to optimize painting.
Pointer lock API: allows the WebView to lock the pointer so that the movement of the pointing device (e.g. mouse) can be used for a different purpose (e.g. first-person shooters).
Last, we also added support for testing automation, and we can support WebDriver now in the new API.
With all this done so far, the plan now is to complete the new WPE API, with a focus on the Settings API and accessibility support, write API tests and documentation, and then also add an external platform to support GTK4. This is done on a best-effort basis, so there’s no specific release date.
WebKit on Android
This year was also a good year for WebKit on Android, also known as WPE Android, as this is a project that sits on top of WPE and its public API (instead of developing a fully-fledged WebKit port).
In case you’re not familiar with this, the idea here is to provide a WebKit-based alternative to the Chromium-based Web view on Android devices, in a way that leverages HW acceleration when possible and that it integrates natively (and nicely) with the several Android subsystems, and of course with Android’s native mainloop. Note that this is an experimental project for now, so don’t expect production-ready quality quite yet, but hopefully something that can be used to start experimenting with selected use cases.
Anyway, as for the changes that happened in the past 12 months, here is a summary:
Updated WPE Android to WPE 2.46 and NDK 27 LTS
Added support for WebDriver and included WPT test suites
Added support for instrumentation tests, and integrated with the GitHub CI
Added support for the remote Web inspector, very useful for debugging
Enabled the Skia backend, bringing HW-accelerated 2D rendering to WebKit on Android
Implemented prompt delegates, allowing implementing things such as alert dialogs
Implemented WPEView client interfaces, allowing responding to things such as HTTP errors
Packaged a WPE-based Android WebView in its own library and published in Maven Central. This is a massive improvement as now apps can use WPE Android by simply referencing the library from the gradle files, no need to build everything on their own.
Other changes: enabled HTTP/2 support (via the migration to libsoup3), added support for the device scale factor, improved the virtual on-screen keyboard, general bug fixing…
On top of that, we published 3 different blog posts covering different topics, from a general intro to a more deep dive explanation of the internals, and showing some demos. You can check them out in Jani’s blog at https://blogs.igalia.com/jani
As for the future, we’ll focus on stabilization and regular maintenance for now, and then we’d like to work towards achieving production-ready quality for specific cases if possible.
Quality Assurance
On the QA front, we had a busy year but in general we could highlight the following topics.
Fixed a lot of API tests failures in the bots that were limiting our test coverage.
Fixed lots of assertions-related crashes in the bots, which were slowing down the bots as well as causing other types of issues, such as bots exiting early due too many failures.
Enabled assertions in the release bots, which will help prevent crashes in the future, as well as with making our debug bots healthier.
Moved all the WebKitGTK and WPE bots to building now with Skia instead of Cairo. This means that all the bots running tests are now using Skia, and there’s only one bot still using Cairo to make sure that the compilation is not broken, but that bot does not run tests.
Moved all the WebKitGTK bots to use GTK4 by default. As with the move to Skia, all the WebKit bots running tests now use GTK4 and the only one remaining building with GTK3 does not run tests, it only makes sure we don’t break the GTK3 compilation for now.
Working on moving all the bots to use the new SDK. This is still work in progress and will likely be completed during 2025 as it’s needed to implement several changes in the infrastructure that will take some time.
General gardening and bot maintenance
In the next months, our main focus would be a revamp of the QA infrastructure to make sure that we can get all the bots (including the debug ones) to a healthier state, finish the migration of all the bots to the new SDK and, ideally, be able to bring back the ready-to-use WPE images that we used to have available in wpewebkit.org.
Security
The current release cadence has been working well, so we continue issuing major releases every 6 months (March, September), and then minor and unstable development releases happening on-demand when needed.
Last, we also shortened the time before including security fixes in stable releases this year, and we have removed support for libsoup2 from WPE, as that library is no longer maintained.
Tooling & Documentation
On tooling, the main piece of news is that this year we released the initial version of the new SDK, which is developed on top of OCI-based containers. This new SDK fixes the issues with the current existing approaches based on JHBuild and flatpak, where one of them was great for development but poor for testing and QA, and the other one was great for testing and QA, but not very convenient for development.
As for documentation, we didn’t do as much as we would have liked here, but we still landed a few contributions in docs.webkit.org, mostly related to WebKitGTK (e.g. Releases and Versioning, Security Updates, Multimedia). We plan to do more on this regard in the next months, though, mostly by writing/publishing more documentation and perhaps also some tutorials.
Final thoughts
This has been a fairly long blog post but, as you can see, it’s been quite a year for WebKit here at Igalia, with many exciting changes happening at several fronts, and so there was quite a lot of stuff to comment on here. This said, you can always check the slides of the presentation in the WebKit Contributors Meeting here if you prefer a more concise version of the same content.
In any case, what’s clear it’s that the next months are probably going to be quite interesting as well with all the work that’s already going on in WebKit and its Linux ports, so it’s possible that in 12 months from now I might be writing an equally long essay. We’ll see.
The <video> element implementation in WebKit does its job by using a multiplatform player that relies on a platform-specific implementation. In the specific case of glib platforms, which base their multimedia on GStreamer, that’s MediaPlayerPrivateGStreamer.
The player private can have 3 buffering modes:
On-disk buffering: This is the typical mode on desktop systems, but is frequently disabled on purpose on embedded devices to avoid wearing out their flash storage memories. All the video content is downloaded to disk, and the buffering percentage refers to the total size of the video. A GstDownloader element is present in the pipeline in this case. Buffering level monitoring is done by polling the pipeline every second, using the fillTimerFired() method.
In-memory buffering: This is the typical mode on embedded systems and on desktop systems in case of streamed (live) content. The video is downloaded progressively and only the part of it ahead of the current playback time is buffered. A GstQueue2 element is present in the pipeline in this case. Buffering level monitoring is done by listening to GST_MESSAGE_BUFFERING bus messages and using the buffering level stored on them. This is the case that motivates the refactoring described in this blog post, what we actually wanted to correct in Broadcom platforms, and what motivated the addition of hysteresis working on all the platforms.
Local files: Files, MediaStream sources and other special origins of video don’t do buffering at all (no GstDownloadBuffering nor GstQueue2 element is present on the pipeline). They work like the on-disk buffering mode in the sense that fillTimerFired() is used, but the reported level is relative, much like in the streaming case. In the initial version of the refactoring I was unaware of this third case, and only realized about it when tests triggered the assert that I added to ensure that the on-disk buffering method was working in GST_BUFFERING_DOWNLOAD mode.
The current implementation (actually, its wpe-2.38 version) was showing some buffering problems on some Broadcom platforms when doing in-memory buffering. The buffering levels monitored by MediaPlayerPrivateGStreamer weren’t accurate because the Nexus multimedia subsystem used on Broadcom platforms was doing its own internal buffering. Data wasn’t being accumulated in the GstQueue2 element of playbin, because BrcmAudFilter/BrcmVidFilter was accepting all the buffers that the queue could provide. Because of that, the player private buffering logic was erratic, leading to many transitions between “buffer completely empty” and “buffer completely full”. This, it turn, caused many transitions between the HaveEnoughData, HaveFutureData and HaveCurrentData readyStates in the player, leading to frequent pauses and unpauses on Broadcom platforms.
So, one of the first thing I tried to solve this issue was to ask the Nexus PlayPump (the subsystem in charge of internal buffering in Nexus) about its internal levels, and add that to the levels reported by GstQueue2. There’s also a GstMultiqueue in the pipeline that can hold a significant amount of buffers, so I also asked it for its level. Still, the buffering level unstability was too high, so I added a moving average implementation to try to smooth it.
All these tweaks only make sense on Broadcom platforms, so they were guarded by ifdefs in a first version of the patch. Later, I migrated those dirty ifdefs to the new quirks abstraction added by Phil. A challenge of this migration was that I needed to store some attributes that were considered part of MediaPlayerPrivateGStreamer before. They still had to be somehow linked to the player private but only accessible by the platform specific code of the quirks. A special HashMap attribute stores those quirks attributes in an opaque way, so that only the specific quirk they belong to knows how to interpret them (using downcasting). I tried to use move semantics when storing the data, but was bitten by object slicing when trying to move instances of the superclass. In the end, moving the responsibility of creating the unique_ptr that stored the concrete subclass to the caller did the trick.
Even with all those changes, undesirable swings in the buffering level kept happening, and when doing a careful analysis of the causes I noticed that the monitoring of the buffering level was being done from different places (in different moments) and sometimes the level was regarded as “enough” and the moment right after, as “insufficient”. This was because the buffering level threshold was one single value. That’s something that a hysteresis mechanism (with low and high watermarks) can solve. So, a logical level change to “full” would only happen when the level goes above the high watermark, and a logical level change to “low” when it goes under the low watermark level.
For the threshold change detection to work, we need to know the previous buffering level. There’s a problem, though: the current code checked the levels from several scattered places, so only one of those places (the first one that detected the threshold crossing at a given moment) would properly react. The other places would miss the detection and operate improperly, because the “previous buffering level value” had been overwritten with the new one when the evaluation had been done before. To solve this, I centralized the detection in a single place “per cycle” (in updateBufferingStatus()), and then used the detection conclusions from updateStates().
So, with all this in mind, I refactored the buffering logic as https://commits.webkit.org/284072@main, so now WebKit GStreamer has a buffering code much more robust than before. The unstabilities observed in Broadcom devices were gone and I could, at last, close Issue 1309.
WebKitGTK and WPEWebKit recently released a new stable version 2.46. This version includes important changes in the graphics implementation.
Skia
The most important change in 2.46 is the introduction of Skia to replace Cairo as the 2D graphics renderer. Skia supports rendering using the GPU, which is now the default, but we also use it for CPU rendering using the same threaded rendering model we had with Cairo. The architecture hasn’t changed much for GPU rendering: we use the same tiled rendering approach, but buffers for dirty regions are rendered in the main thread as textures. The compositor waits for textures to be ready using fences and copies them directly to the compositor texture. This was the simplest approach that already resulted in much better performance, specially in the desktop with more powerful GPUs. In embedded systems, where GPUs are not so powerful, it’s still better to use the CPU with several rendering threads in most of the cases. It’s still too early to announce anything, but we are already experimenting with different models to improve the performance even more and make a better usage of the GPU in embedded devices.
Skia has received several GCC specific optimizations lately, but it’s always more optimized when built with clang. The optimizations are more noticeable in performance when using the CPU for rendering. For this reason, since version 2.46 we recommend to build WebKit with clang for the best performance. GCC is still supported, of course, and performance when built with GCC is quite good too.
HiDPI
Even though there aren’t specific changes about HiDPI in 2.46, users of high resolution screens using a device scale factor bigger than 1 will notice much better performance thanks to scaling being a lot faster on the GPU.
Accelerated canvas
The 2D canvas can be accelerated independently on whether the CPU or the GPU is used for painting layers. In 2.46 there’s a new setting WebKitSettings:enable-2d-canvas-acceleration to control the 2D canvas acceleration. In some embedded devices the combination of CPU rendering for layer tiles and GPU for the canvas gives the best performance. The 2D canvas is normally rendered into an image buffer that is then painted in the layer as an image. We changed that for the accelerated case, so that the canvas is now rendered into a texture that is copied to a compositor texture to be directly composited instead of painted into the layer as an image. In 2.46 the offscreen canvas is enabled by default.
There are more cases where accelerating the canvas is not desired, for example when the canvas size is not big enough it’s faster to use the GPU. Also when there’s going to be many operations to “download” pixels from GPU. Since this is not always easy to predict, in 2.46 we added support for the willReadFrequently canvas setting, so that when set by the application when creating the canvas it causes the canvas to be always unaccelerated.
Filters
All the CSS filters are now implemented using Skia APIs, and accelerated when possible. The most noticeable change here is that sites using blur filters are no longer slow.
Color spaces
Skia brings native support for color spaces, which allows us to greatly simplify the color space handling code in WebKit. WebKit uses color spaces in many scenarios – but especially in case of SVG and filters. In case of some filters, color spaces are necessary as some operations are simpler to perform in linear sRGB. The good example of that is feDiffuseLighting filter – it yielded wrong visual results for a very long time in case of Cairo-based implementation as Cairo doesn’t have a support for color spaces. At some point, however, Cairo-based WebKit implementation has been fixed by converting pixels to linear in-place before applying the filter and converting pixels in-place back to sRGB afterwards. Such a workarounds are not necessary anymore as with Skia, all the pixel-level operations are handled in a color-space-transparent way as long as proper color space information is provided. This not only impacts the results of some filters that are now correct, but improves performance and opens new possibilities for acceleration.
Font rendering
Font rendering is probably the most noticeable visual change after the Skia switch with mixed feedback. Some people reported that several sites look much better, while others reported problems with kerning in other sites. In other cases it’s not really better or worse, it’s just that we were used to the way fonts were rendered before.
Damage tracking
WebKit already tracks the area of the layers that has changed to paint only the dirty regions. This means that we only repaint the areas that changed but the compositor incorporates them and the whole frame is always composited and passed to the system compositor. In 2.46 there’s experimental code to track the damage regions and pass them to the system compositor in addition to the frame. Since this is experimental it’s disabled by default, but can be enabled with the runtime feature PropagateDamagingInformation. There’s also UnifyDamagedRegions feature that can be used in combination with PropagateDamagingInformation to unify the damage regions into one before passing it to the system compositor. We still need to analyze the impact of damage tracking in performance before enabling it by default. We have also started an experiment to use the damage information in WebKit compositor and avoid compositing the entire frame every time.
GPU info
Working on graphics can be really hard in Linux, there are too many variables that can result in different outputs for different users: the driver version, the kernel version, the system compositor, the EGL extensions available, etc. When something doesn’t work for some people and work for others, it’s key for us to gather as much information as possible about the graphics stack. In 2.46 we have added more useful information to webkit://gpu, like the DMA-BUF buffer format and modifier used (for GTK port and WPE when using the new API). Very often the symptom is the same, nothing is rendered in the web view, even when the causes could be very different. For those cases, it’s even more difficult to gather the info because webkit://gpu doesn’t render anything either. In 2.46 it’s possible to load webkit://gpu/stdout to get the information as a JSON directly in stdout.
Sysprof
Another common symptom for people having problems is that a particular website is slow to render, while for others it works fine. In these cases, in addition to the graphics stack information, we need to figure out where we are slower and why. This is very difficult to fix when you can’t reproduce the problem. We added initial support for profiling in 2.46 using sysprof. The code already has some marks so that when run under sysprof we get useful information about timings of several parts of the graphics pipeline.
Next
This is just the beginning, we are already working on changes that will allow us to make a better use of both the GPU and CPU for the best performance. We have also plans to do other changes in the graphics architecture to improve synchronization, latency and security. Now that we have adopted sysprof for profiling, we are also working on improvements and new tools.
Move semantics can be very useful to transfer ownership of resources, but as many other C++ features, it’s one more double edge sword that can harm yourself in new and interesting ways if you don’t read the small print.
For instance, if object moving involves super and subclasses, you have to keep an extra eye on what’s actually happening. Consider the following classes A and B, where the latter inherits from the former:
#include <stdio.h>
#include <utility>
#define PF printf("%s %p\n", __PRETTY_FUNCTION__, this)
class A {
public:
A() { PF; }
virtual ~A() { PF; }
A(A&& other)
{
PF;
std::swap(i, other.i);
}
int i = 0;
};
class B : public A {
public:
B() { PF; }
virtual ~B() { PF; }
B(B&& other)
{
PF;
std::swap(i, other.i);
std::swap(j, other.j);
}
int j = 0;
};
If your project is complex, it would be natural that your code involves abstractions, with part of the responsibility held by the superclass, and some other part by the subclass. Consider also that some of that code in the superclass involves move semantics, so a subclass object must be moved to become a superclass object, then perform some action, and then moved back to become the subclass again. That’s a really bad idea!
Consider this usage of the classes defined before:
int main(int, char* argv[]) {
printf("Creating B b1\n");
B b1;
b1.i = 1;
b1.j = 2;
printf("b1.i = %d\n", b1.i);
printf("b1.j = %d\n", b1.j);
printf("Moving (B)b1 to (A)a. Which move constructor will be used?\n");
A a(std::move(b1));
printf("a.i = %d\n", a.i);
// This may be reading memory beyond the object boundaries, which may not be
// obvious if you think that (A)a is sort of a (B)b1 in disguise, but it's not!
printf("(B)a.j = %d\n", reinterpret_cast<B&>(a).j);
printf("Moving (A)a to (B)b2. Which move constructor will be used?\n");
B b2(reinterpret_cast<B&&>(std::move(a)));
printf("b2.i = %d\n", b2.i);
printf("b2.j = %d\n", b2.j);
printf("^^^ Oops!! Somebody forgot to copy the j field when creating (A)a. Oh, wait... (A)a never had a j field in the first place\n");
printf("Destroying b2, a, b1\n");
return 0;
}
If you’ve read the code, those printfs will have already given you some hints about the harsh truth: if you move a subclass object to become a superclass object, you’re losing all the subclass specific data, because no matter if the original instance was one from a subclass, only the superclass move constructor will be used. And that’s bad, very bad. This problem is called object slicing. It’s specific to C++ and can also happen with copy constructors. See it with your own eyes:
Creating B b1
A::A() 0x7ffd544ca690
B::B() 0x7ffd544ca690
b1.i = 1
b1.j = 2
Moving (B)b1 to (A)a. Which move constructor will be used?
A::A(A&&) 0x7ffd544ca6a0
a.i = 1
(B)a.j = 0
Moving (A)a to (B)b2. Which move constructor will be used?
A::A() 0x7ffd544ca6b0
B::B(B&&) 0x7ffd544ca6b0
b2.i = 1
b2.j = 0
^^^ Oops!! Somebody forgot to copy the j field when creating (A)a. Oh, wait... (A)a never had a j field in the first place
Destroying b2, a, b1
virtual B::~B() 0x7ffd544ca6b0
virtual A::~A() 0x7ffd544ca6b0
virtual A::~A() 0x7ffd544ca6a0
virtual B::~B() 0x7ffd544ca690
virtual A::~A() 0x7ffd544ca690
Why can something that seems so obvious become such a problem, you may ask? Well, it depends on the context. It’s not unusual for the codebase of a long lived project to have started using raw pointers for everything, then switching to using references as a way to get rid of null pointer issues when possible, and finally switch to whole objects and copy/move semantics to get rid or pointer issues (references are just pointers in disguise after all, and there are ways to produce null and dangling references by mistake). But this last step of moving from references to copy/move semantics on whole objects comes with the small object slicing nuance explained in this post, and when the size and all the different things to have into account about the project steals your focus, it’s easy to forget about this.
So, please remember: never use move semantics that convert your precious subclass instance to a superclass instance thinking that the subclass data will survive. You can regret about it and create difficult to debug problems inadvertedly.
We all think we’re smart enough to not be tricked by a phishing attempt, right? Unfortunately, I know for certain that I’m not, because I entered my GitHub password into a lookalike phishing website a year or two ago. Oops! Fortunately, I noticed right away, so I simply changed my unique, never-reused password and moved on. But if the attacker were smarter, I might have never noticed. (This particular attack website was relatively unsophisticated and proxied only an unauthenticated view of GitHub, a big clue that something was wrong. Update: I want to be clear that it would have been very easy for the attacker to simply redirect me to the real github.com after stealing my credentials, in which case I would not have noticed the attack and would not have known to change my password.)
You might think multifactor authentication is the best defense against phishing. Nope. Although multifactor authentication is a major security improvement over passwords alone, and the particular attack that tricked me did not attempt to subvert multifactor authentication, it’s actually unfortunately pretty easy for phishers to defeat most multifactor authentication if they wish to do so:
Multifactor authentication based on phone calls is also insecure (because SIM swapping isn’t going away; determined attackers will steal your phone number if it’s an obstacle to them)
Multifactor authentication based on authenticator apps (using TOTP or HOTP) is much better in general, but still fails against phishing. When you paste your one-time access code into a phishing website, the phishing website can simply “proxy” the access code you kindly provided to them by submitting it to the real website. This only allows authenticating once, but once is usually enough.
Fortunately, there is a solution: passkeys. Based on FIDO2 and WebAuthn, passkeys resist phishing because the authentication process depends on the domain of the service that you’re actually connecting to. If you think you’re visiting https://example.com, but you’re actually visiting a copycat website with a Cyrillic а instead of Latin a, then no worries: the authentication will fail, and the frustrated attacker will have achieved nothing.
The most popular form of passkey is local biometric authentication running on your phone, but any hardware security key (e.g. YubiKey) is also a good bet.
target.com Is More Secure than Your Bank!
I am not joking when I say that target.com is more secure than your bank (which is probably still relying on SMS or phone calls, and maybe even allows you to authenticate using easily-guessable security questions):
target.com is introducing passkeys!
Good job for supporting passkeys, Target.
It’s probably perfectly fine for Target to support passkeys alongside passwords indefinitely. Higher-security websites that want to resist phishing (e.g. your employer’s SSO service) should consider eventually allowing only passkeys.
No Passkeys in WebKitGTK
Unfortunately for GNOME users, WebKitGTK does not yet support WebAuthn, so passkeys will not work in GNOME Web (Epiphany). That’s my browser of choice, so I’ve never touched a passkey before and don’t actually know how well they work in practice. Maybe do as I say and not as I do? If you require high security, you will unfortunately need to use Firefox or Chrome instead, at least for the time being.
Why Was Michael Visiting a Fake github.com?
The fake github.com appeared higher than the real github.com in the DuckDuckGo search results for whatever I was looking for at the time. :(
In recent months I’ve been privileged to work on the transition from Cairo to Skia for 2D graphics rendering in WPE and GTK WebKit ports. Big
reworks like this are a great opportunity to explore all kinds of graphics-related APIs. One of the broader APIs in this area is the CanvasRenderingContext2D API from HTML
Canvas. It’s a fairly straightforward yet extensive API allowing one to perform all kinds of drawing operations on the canvas. The comprehensiveness, however, comes at the expense of
some complex situations the web engine needs to handle under the hood. One such situation was the issue I was working on recently regarding broken test cases
involving drawing shadows when using Skia in WebKit. What makes it complex is that some problems are still visible due to multiple web engine layers being involved, but despite that I was eventually able to address the
broken test cases.
In the next few sections I’m going to introduce the parts of the API that are involved in the problems while in the sections closer to the end I will gradually showcase the problems and explore potential paths toward fixing the entire situation.
Drawing on Canvas2D with globalCompositeOperation#
The Canvas2D API offers multiple methods for drawing various primitives such as rectangles, arcs, text etc. On top of that, it allows one to control compositing and clipping
using the globalCompositeOperation property. The idea is very simple - the user of an API can change the property using one of the predefined
compositing operations and immediately after that, all new drawing operations will behave according to the rules the particular compositing operation specifies:
canvas2DContext.fillRect(...);// Draws rect on top of existing content (default). canvas2DContext.globalCompositeOperation ='destination-atop'; canvas2DContext.fillRect(...);// Draws rect according to 'destination-atop'.
There are many compositing operations, but I’ll be focusing mostly on the ones having source and destination in their names.
The source and destination terms refer to the new content to be drawn and the existing (already-drawn) content respectively.
The images below present some examples of compositing operations in action:
When drawing primitives using the Canvas2D API one can use shadow* properties
to enable drawing of shadows along with any content that is being drawn. The usage is very simple - one has to alter at least one property such as e.g. shadowOffsetX to make the shadow visible:
canvas2DContext.shadowColor ="#0f0"; canvas2DContext.shadowOffsetX =10; // From now on, any draw call will have a green shadow attached.
the above combined with simple code to draw a circle produces a following effect:
Things are getting interesting once one starts thinking about how globalCompositeOperation may affect the way shadows are drawn. When I thought about it for the first time, I imagined at least 3 possibilities:
Shadow and shadow origin are both treated as one entity (shadow always below the origin) and thus are drawn together.
Shadow and shadow origin are combined and then drawn as a one entity.
Shadow and shadow origin are drawn separately - shadow first, then the content.
When I confronted the above with the drawing model and shadows specification,
it turned out the last guess was the correct one. The specification basically says that the shadow should be computed first, then composited within the clipping region over the current canvas content, and finally, the shadow origin should be composited
within the clipping region over the current canvas content (the original canvas content combined with shadow).
The above can be confirmed visually using few examples (generated using chromium browser v126.0.6478.126):
The source-over operation shows the drawing order - destination first, shadow second, and shadow origin third.
The destination-over operation shows the reversed drawing order - destination first, shadow second (below destination), and shadow origin third (below destination and shadow).
The source-atop operation is more tricky as it behaves like source-over but with clipping to the destination content - therefore, destination is drawn first, then clipping is set to destination, then the shadow is drawn,
and finally the shadow origin is drawn.
The destination-atop operation is even more tricky as it behaves like destination-over yet with the clipping region always being different. That difference can be seen on the image below that presents intermediate states of canvas
after each drawing step:
The initial state shows a canvas after drawing the destination on it.
The after drawing shadow state, shows a shadow drawn below the destination. In this case, the clipping is set to new content (shadow), and hence the part of destination that is not “atop” shadow is being clipped out.
The after drawing shadow origin state, shows the final state after drawing the shadow origin below the previous canvas content (new destination) that is at this point “a shadow combined with destination”. Similarly as in the previous step,
the clipping is set to the new content (shadow origin), and hence any part of new destination that is not “atop” the shadow origin is being clipped out.
Whenever one realizes the drawing of shadows with globalCompositeOperation in general may be tricky, then one must also consider that when it comes to particular browser engines, the things are even more tricky as virtually no graphics library
provides an API that matches the Canvas2D API 1-to-1. This means that depending on the graphics library used, the browser engine must implement more or less integration parts
here and there. For example, one can imagine that some graphics library may not have native support for shadows - that would mean the browser engine has to prepare shadows itself by e.g. drawing shadow origin (no matter how complex) on extra
surface, changing color, blurring etc. so that it can be used as a whole once prepared.
Having said the above, one would expect that all the above aspects should be tested and implemented really well. After all, whenever the subject matter becomes complicated, extra care is required. It turns out, however, this is not necessarily the
case when it comes to globalCompositeOperation and shadows. As for the testing part, there are very few tests
(2d.shadow.composite*) in WPT (Web Platform Tests) covering the use cases described above. It’s also not much better for internal web engine test suites. As for implementations, there’s a substantial amount of
discrepancy.
To show exactly what’s the situation, the examples from section Shadows meet globalCompositeOperation
can be used again. This time using browsers representing different web engines:
Chromium 126.0.6478.126
Firefox 128.0
Gnome Web (Epiphany) 45.0 (WebKit/Cairo)
WPE MiniBrowser build from WebKit@098c58dd13bf40fc81971361162e21d05cb1f74a (WebKit/Skia)
Safari 17.1 (WebKit/Core Graphics)
Servo release from 2024/07/04
Ladybird build from 2024/06/29
First of all, it’s evident that experimental browsers such as servo and ladybird are falling behind the competition - servo doesn’t seem to support shadows at all, while ladybird doesn’t support anything other than drawing a rect filled with color.
Second, the non-experimental browsers are pretty stable in terms of covering most of the combinations presented above.
Finally, the most tricky combination above seems to be the one including destination-atop - in that case almost every mainstream browser renders different results:
Chromium is the only one rendering correctly.
Firefox and Epiphany are pretty close, but both are suffering from a similar glitch where the red part is covered by the part of destination that should be clipped out already.
WPE MiniBrowser and Safari are both rendering in correct order, but the clipping is wrong.
Until now, the discrepancies don’t seem to be very dramatic, and hence it’s time to present more sophisticated examples that are an extended
version of the test case from the WebKit source tree:
Chromium 126.0.6478.126
Firefox 128.0
Gnome Web (Epiphany) 45.0 (WebKit/Cairo)
WPE MiniBrowser build from WebKit@098c58dd13bf40fc81971361162e21d05cb1f74a (WebKit/Skia)
Safari 17.1 (WebKit/Core Graphics)
Servo release from 2024/07/04
Ladybird build from 2024/06/29
Other than destination-out, xor, and a few simple operations presented before, all the operations presented above pose serious problems to the majority of browsers. The only browser that is correct in all the cases
(to the best of my understanding) is Chromium that is using rendering engine called blink which in turn uses the Skia library. One may wonder if perhaps it’s Skia that’s responsible for the Chromium success,
but given the above results where e.g. WPE MiniBrowser uses Skia as well, it’s evident that the problems lay above the particular graphics library.
Looking at the operations and browsers that render incorrectly, it’s clearly visible that even small problems - with either ordering of draw calls or clipping - lead to spectacularly broken results. The pinnacle of misery is the source-out
operation that is the most variable one across browsers. One has to admit, however, that WPE MiniBrowser is slightly closer to being correct than others.
Fixing the above problems is a long journey. After all, every single web engine has to be fixed in its own, specific way. If the specification would be a problem - it would be the obvious way to start. However, as mentioned in the section
Shadows meet globalCompositeOperation, the specification, is pretty clear on how drawing, shadows, and globalCompositeOperation come together. In such case, the next obvious place to
start improving things is a WPT test suite.
What makes WPT outstanding is that it is a de facto standard cross-browser test suite for testing the web platform stack. Thus the test suite is developed as an open collaboration effort by developers from around the globe and hence is very broad
in terms of specification coverage. What’s also important, the test results are actively evaluated against the popular browser engines and published under wpt.fyi, therefore putting some pressure on web engine developers
to fix the problems so that they keep up with competition.
Granted the above, extending WPE test suite by adding test cases to cover globalCompositeOperation operations combined with shadows is the reasonable first step towards the unification of browser implementations. This can be done either by
directly contributing tests to WPT, or by creating an issue. Personally, I’ve decided to file an issue first (WPT#46544) and to add
tests once I have some time. I haven’t contributed to WPT yet, but I’m excited to work with it soon. Once I land my first pull request, I’ll start fixing WebKit and I won’t hesitate to post some updates on this blog.
In this post I’ll try to document the journey starting from a WebKit issue and
ending up improving third-party projects that WebKitGTK and WPEWebKit depend on.
I’ve been working on WebKit’s GStreamer backends for a while. Usually some new
feature needed on WebKit side would trigger work …
In the previous post I talked about the plans of the WebKit ports currently using Cairo to switch to Skia for 2D rendering. Apple ports don’t use Cairo, so they won’t be switching to Skia. I understand the post title was confusing, I’m sorry about that. The original post has been updated for clarity.
In recent years we have had an ongoing effort to improve graphics performance of the WebKit GTK and WPE ports. As a result of this we shipped features like threaded rendering, the DMA-BUF renderer, or proper vertical retrace synchronization (VSync). While these improvements have helped keep WebKit competitive, and even perform better than other engines in some scenarios, it has been clear for a while that we were reaching the limits of what can be achieved with a CPU based 2D renderer.
There was an attempt at making Cairo support GPU rendering, which did not work particularly well due to the library being designed around stateful operation based upon the PostScript model—resulting in a convenient and familiar API, great output quality, but hard to retarget and with some particularly slow corner cases. Meanwhile, other web engines have moved more work to the GPU, including 2D rendering, where many operations are considerably faster.
We checked all the available 2D rendering libraries we could find, but none of them met all our requirements, so we decided to try writing our own library. At the beginning it worked really well, with impressive results in performance even compared to other GPU based alternatives. However, it proved challenging to find the right balance between performance and rendering quality, so we decided to try other alternatives before continuing with its development. Our next option had always been Skia. The main reason why we didn’t choose Skia from the beginning was that it didn’t provide a public library with API stability that distros can package and we can use like most of our dependencies. It still wasn’t what we wanted, but now we have more experience in WebKit maintaining third party dependencies inside the source tree like ANGLE and libwebrtc, so it was no longer a blocker either.
In December 2023 we made the decision of giving Skia a try internally and see if it would be worth the effort of maintaining the project as a third party module inside WebKit. In just one month we had implemented enough features to be able to run all MotionMark tests. The results in the desktop were quite impressive, getting double the score of MotionMark global result. We still had to do more tests in embedded devices which are the actual target of WPE, but it was clear that, at least in the desktop, with this very initial implementation that was not even optimized (we kept our current architecture that is optimized for CPU rendering) we got much better results. We decided that Skia was the option, so we continued working on it and doing more tests in embedded devices. In the boards that we tried we also got better results than CPU rendering, but the difference was not so big, which means that with less powerful GPUs and with our current architecture designed for CPU rendering we were not that far from CPU rendering. That’s the reason why we managed to keep WPE competitive in embeeded devices, but Skia will not only bring performance improvements, it will also simplify the code and will allow us to implement new features . So, we had enough data already to make the final decision of going with Skia.
In February 2024 we reached a point in which our Skia internal branch was in an “upstreamable” state, so there was no reason to continue working privately. We met with several teams from Google, Sony, Apple and Red Hat to discuss with them about our intention to switch from Cairo to Skia, upstreaming what we had as soon as possible. We got really positive feedback from all of them, so we sent an email to the WebKit developers mailing list to make it public. And again we only got positive feedback, so we started to prepare the patches to import Skia into WebKit, add the CMake integration and the initial Skia implementation for the WPE port that already landed in main.
We will continue working on the Skia implementation in upstream WebKit, and we also have plans to change our architecture to better support the GPU rendering case in a more efficient way. We don’t have a deadline, it will be ready when we have implemented everything currently supported by Cairo, we don’t plan to switch with regressions. We are focused on the WPE port for now, but at some point we will start working on GTK too and other ports using cairo will eventually start getting Skia support as well.
When the post has not description: field, this first paragraph is the excerpt but we want it to be rendered using markdown.
The rest of the post content is not relevant for this particular case.
This is a short PSA post announcing the return of the GNOME Web Canary builds.
Read on for the crunchy details.
A couple years ago I was blogging about the GNOME Web Canary
flavor.
In summary this special build of GNOME Web provides a preview of the upcoming
version of …
When accelerated compositing support was added to WebKitGTK, there was only X11. Our first approach was quite simple, we sent the web view widget Xwindow ID to the web process to be used as rendering target using GLX. This was very efficient, but soon we realized it broke the GTK rendering model so it was not possible to use a web view inside a GtkOverlay, for example, to show status messages on top. The solution was to use a redirected Xcomposite window in the web process, and use its ID as the render target using GLX. The pixmap ID of the redirected Xcomposite window was sent to the UI process to be painted in the web view widget using a Cairo Xlib surface. Since the rendering happens in the web process, this approach required to use Xdamage to monitor when the redirected Xcomposite window was updated to schedule a web view redraw.
Wayland support
To support accelerated compositing under Wayland we initially added a nested Wayland compositor running in the UI process. The web process connected to the nested Wayland compositor and created a surface to be used as the rendering target using EGL. The good thing about this approach compared to the X11 one, is that we can create an EGLImage from Wayland buffers and use a GDK GL context to paint the contents in the web view. This is more efficient than X11 because we can use OpenGL both in web and UI processes.
WPE, when using the fdo backend, uses the same approach of running a nested Wayland compositor, but in a more efficient way, using DMABUF instead of Wayland buffers when available. So, we decided to use libwpe in the GTK port only for rendering under Wayland, and eventually remove our Wayland compositor implementation.
Before the removal of the custom Wayland compositor we had all these possible combinations:
UI Process
X11: Cairo Xlib surface
Wayland: EGL
Web Process
X11: GLX using redirected Xwindow
Wayland (nested Wayland compositor): EGL using Wayland surface
Wayland (libwpe): EGL using libwpe to get the Wayland surface
To reduce a bit the differences, and to make it easier to support WebGL with ANGLE we decided to change X11 to prefer EGL if possible, falling back to GLX only if EGL failed.
GTK4
GTK4 was released and we added support for it. The fact that GTK4 uses GL by default should make the rendering more efficient in accelerated compositing mode. This is definitely true under Wayland, because we are using a GL context already, so we just keep passing a texture to GTK to paint the contents in the web view. However, in the case of X11 we still have a Cairo Xlib surface that GTK paints into a Cairo image surface to be uploaded to the GPU. With GTK4 now we have two more combinations in the UI process side X11 + GTK3, X11 + GTK4, Wayland + GTK3 and Wayland + GTK4.
Reducing all the combinations to (almost) one: DMABUF
All these combinations to support the different platforms made it quite difficult to maintain, every time we get a bug report about something not working in accelerated compositing mode we have to figure out the combination actually used by the reporter, GTK3 or GTK4? X11 or Wayland? using EGL or GLX? custom Wayland compositor or libwpe? driver? version? etc.
We are already using DMABUF in WebKit for different things like WebGL and media rendering, so we thought that we could also use it for sharing the rendered buffer between the web and UI processes. That would be a more efficient solution but it would also drastically reduce the amount of combinations to maintain. The web process always uses the surfaceless platform, so it doesn’t matter if it’s under Wayland or X11. Then we create a surfaceless context as the render target and use EGL and GBM APIs to export the contents as a DMABUF buffer. The UI process imports the DMABUF buffer using EGL and GBM too, to be passed to GTK as a texture that is painted in the web view.
This theoretically recudes all the previous combinations to just one (note that we removed GLX support entirely, making EGL a requirement for accelerated compositing), but there’s a problem under X11: GTK3 doesn’t support EGL on X11 and GTK4 defaults to EGL but falls back to GLX if it doesn’t find an EGL config that perfectly matches the screen visual. In my system it never finds that EGL config because mesa doesn’t expose any 32 bit depth config. So, in the case of GTK3 we have to manually download the buffer to CPU and paint normally using Cairo, but in the case of GTK4 + GLX, GTK uploads the buffer again to be painted using GLX. I don’t think it’s possible to force GTK to use EGL from the API, but at least you can use GDK_DEBUG=gl-egl.
WebKitGTK 2.41.1
WebKitGTK 2.41.1 is the first unstable release of this cycle and already includes the DMABUF support that is used by default. We encourage everybody to try it out and provide feedback or report any issue. Please, export the contents of webkit://gpu and attach it to the bug report when reporting any problem related to graphics. To check if the issue is a regression of the DMABUF implementation you can use WEBKIT_DISABLE_DMABUF_RENDERER=1 to use the WPE renderer or X11 instead. This environment variable and the WPE render/X11 code will be eventually removed if DMABUF works fine.
WPE
If this approach works fine we plan to use something similar for the WPE port and get rid of the nested Wayland compositor there too.
With the release of WebKitGTK 2.40.0, WebKitGTK now finally provides a stable API and ABI for GTK 4 applications. The following API versions are provided:
webkit2gtk-4.0: this API version uses GTK 3 and libsoup 2. It is obsolete and users should immediately port to webkit2gtk-4.1. To get this with WebKitGTK 2.40, build with -DPORT=GTK -DUSE_SOUP2=ON.
webkit2gtk-4.1: this API version uses GTK 3 and libsoup 3. It contains no other changes from webkit2gtk-4.0 besides the libsoup version. With WebKitGTK 2.40, this is the default API version that you get when you build with -DPORT=GTK. (In 2.42, this might require a different flag, e.g. -DUSE_GTK3=ON, which does not exist yet.)
webkitgtk-6.0: this API version uses GTK 4 and libsoup 3. To get this with WebKitGTK 2.40, build with -DPORT=GTK -DUSE_GTK4=ON. (In 2.42, this might become the default API version.)
WebKitGTK 2.38 had a different GTK 4 API version, webkit2gtk-5.0. This was an unstable/development API version and it is gone in 2.40, so applications using it will break. Fortunately, that should be very few applications. If your operating system ships GNOME 42, or any older version, or the new GNOME 44, then no applications use webkit2gtk-5.0 and you have no extra work to do. But for operating systems that ship GNOME 43, webkit2gtk-5.0 is used by gnome-builder, gnome-initial-setup, and evolution-data-server:
For evolution-data-server 3.46, use this patch which applies on evolution-data-server 3.46.4.
For gnome-initial-setup 43, use this patch which applies on gnome-initial-setup 43.2. (Update: for your convenience, this patch will be included in gnome-initial-setup 43.3.)
For gnome-builder 43, all required changes are present in version 43.7.
Remember, patching is only needed for GNOME 43. Other versions of GNOME will have no problems with WebKitGTK 2.40.
There is no proper online documentation yet, but in the meantime you can view the markdown source for the migration guide to help you with porting your applications. Although the API is now stable and it is close to feature parity with the GTK 3 version, there are some problems to be aware of:
No support for accessibility. The GTK and WebKitGTK developers are collaborating on a plan to fix this, and I hope this will be ready for WebKitGTK 2.42.
Various smaller regressions here and there. The GTK 4 version is almost as stable as GTK 3, and now is a good time to start using it. As with any interesting new software, there is still some work to do.
Big thanks to everyone who helped make this possible.
Some time ago we at Igalia embarked on the journey to ship
a GStreamer-powered WebRTC backend. This is
a long journey, it is not over, but we made some progress …
Leverage agile frameworks to provide a robust synopsis for high level overviews. Iterative approaches to corporate strategy foster collaborative thinking to further the overall value proposition. Organically grow the holistic world view of disruptive innovation via workplace diversity and empowerment.
Bring to the table win-win survival strategies to ensure proactive domination. At the end of the day, going forward, a new normal that has evolved from generation X is on the runway heading towards a streamlined cloud solution. User generated content in real-time will have multiple touchpoints for offshoring.
Capitalize on low hanging fruit to identify a ballpark value added activity to beta test. Override the digital divide with additional clickthroughs from DevOps. Nanotechnology immersion along the information highway will close the loop on focusing solely on the bottom line.
Leverage agile frameworks to provide a robust synopsis for high level overviews. Iterative approaches to corporate strategy foster collaborative thinking to further the overall value proposition. Organically grow the holistic world view of disruptive innovation via workplace diversity and empowerment.
Bring to the table win-win survival strategies to ensure proactive domination. At the end of the day, going forward, a new normal that has evolved from generation X is on the runway heading towards a streamlined cloud solution. User generated content in real-time will have multiple touchpoints for offshoring.
// this is a command functionmyCommand(){ let counter =0; counter++; }
// Test with a line break above this line. console.log('Test');
Bring to the table win-win survival strategies to ensure proactive domination. At the end of the day, going forward, a new normal that has evolved from generation X is on the runway heading towards a streamlined cloud solution. User generated content in real-time will have multiple touchpoints for offshoring.
// this is a command
function myCommand() {
let counter = 0;
counter++;
}
// Test with a line break above this line.
console.log('Test');
Capitalize on low hanging fruit to identify a ballpark value added activity to beta test. Override the digital divide with additional clickthroughs from DevOps. Nanotechnology immersion along the information highway will close the loop on focusing solely on the bottom line.
Today, WebKit in Linux operating systems is much more secure than it used to be. The problems that I previously discussed in this old, formerly-popular blog post are nowadays a thing of the past. Most major Linux operating systems now update WebKitGTK and WPE WebKit on a regular basis to ensure known vulnerabilities are fixed. (Not all Linux operating systems include WPE WebKit. It’s basically WebKitGTK without the dependency on GTK, and is the best choice if you want to use WebKit on embedded devices.) All major operating systems have removed older, insecure versions of WebKitGTK (“WebKit 1”) that were previously a major security problem for Linux users. And today WebKitGTK and WPE WebKit both provide a webkit_web_context_set_sandbox_enabled() API which, if enabled, employs Linux namespaces to prevent a compromised web content process from accessing your personal data, similar to Flatpak’s sandbox. (If you are a developer and your application does not already enable the sandbox, you should fix that!)
Unfortunately, QtWebKit has not benefited from these improvements. QtWebKit was removed from the upstream WebKit codebase back in 2013. Its current status in Fedora is, unfortunately, representative of other major Linux operating systems. Fedora currently contains two versions of QtWebKit:
The qtwebkit package contains upstream QtWebKit 2.3.4 from 2014. I believe this is used by Qt 4 applications. For avoidance of doubt, you should not use applications that depend on a web engine that has not been updated in eight years.
The newer qt5-qtwebkit contains Konstantin Tokarev’s fork of QtWebKit, which is de facto the new upstream and without a doubt the best version of QtWebKit available currently. Although it has received occasional updates, most recently 5.212.0-alpha4 from March 2020, it’s still based on WebKitGTK 2.12 from 2016, and the release notes bluntly state that it’s not very safe to use. Looking at WebKitGTK security advisories beginning with WSA-2016-0006, I manually counted 507 CVEs that have been fixed in WebKitGTK 2.14.0 or newer.
These CVEs are mostly (but not exclusively) remote code execution vulnerabilities. Many of those CVEs no doubt correspond to bugs that were introduced more recently than 2.12, but the exact number is not important: what’s important is that it’s a lot, far too many for backporting security fixes to be practical. Since qt5-qtwebkit is two years newer than qtwebkit, the qtwebkit package is no doubt in even worse shape. And because QtWebKit does not have any web process sandbox, any remote code execution is game over: an attacker that exploits QtWebKit gains full access to your user account on your computer, and can steal or destroy all your files, read all your passwords out of your password manager, and do anything else that your user account can do with your computer. In contrast, with WebKitGTK or WPE WebKit’s web process sandbox enabled, attackers only get access to content that’s mounted within the sandbox, which is a much more limited environment without access to your home directory or session bus.
In short, it’s long past time for Linux operating systems to remove QtWebKit and everything that depends on it. Do not feed untrusted data into QtWebKit. Don’t give it any HTML that you didn’t write yourself, and certainly don’t give it anything that contains injected data. Uninstall it and whatever applications depend on it.
Update: I forgot to mention what to do if you are a developer and your application still uses QtWebKit. You should ensure it uses the most recent release of QtWebEngine for Qt 6. Do not use old versions of Qt 6, and do not use QtWebEngine for Qt 5.
These articles are an interesting read not only if you're working on
WebKit, but also if you are curious on how a modern browser engine
works and some of the moving parts beneath the surface. So go check them out!
On a related note, the WebKit team is always on the lookout for talent
to join us. Experience with WebKit or browsers is not necessarily a
must, as we know from experience that anyone with a strong C/C++
background and enough curiosity will be able to ramp up and start
contributing soon enough. If these articles spark your curiosity,
feel free to reach out to me to find out more
or to
apply directly!
This article begins a series of technical writeups on the
architecture of WPE, and we hope to publish during the rest of the
year further articles breaking down different components of WebKit,
including graphics and other subsystems, that will surely be
of great help for those interested in getting more familiar
with WebKit and its internals.
Today I am happy to unveil GNOME Web Canary which aims to provide bleeding edge,
most likely very unstable builds of Epiphany, depending on daily builds of the
WebKitGTK development version. Read on to know more about this.
Until recently the GNOME Web browser was available for end-users in two …
This is the last post of the series showing interesting debugging tools, I hope you have found it useful. Don’t miss the custom scripts at the bottom to process GStreamer logs, help you highlight the interesting parts and find the root cause of difficult bugs. Here are also the previous posts of the series:
This is useful to know why a particular package isn’t found and what are the default values for PKG_CONFIG_PATH when it’s not defined. For example:
Adding directory '/usr/local/lib/x86_64-linux-gnu/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/local/lib/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/local/share/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/lib/x86_64-linux-gnu/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/lib/pkgconfig' from PKG_CONFIG_PATH
Adding directory '/usr/share/pkgconfig' from PKG_CONFIG_PATH
If we have tuned PKG_CONFIG_PATH, maybe we also want to add the default paths. For example:
SYSROOT=~/sysroot-x86-64
export PKG_CONFIG_PATH=${SYSROOT}/usr/local/lib/pkgconfig:${SYSROOT}/usr/lib/pkgconfig
# Add also the standard pkg-config paths to find libraries in the system
export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/usr/local/lib/x86_64-linux-gnu/pkgconfig:\
/usr/local/lib/pkgconfig:/usr/local/share/pkgconfig:/usr/lib/x86_64-linux-gnu/pkgconfig:\
/usr/lib/pkgconfig:/usr/share/pkgconfig
# This tells pkg-config where the "system" pkg-config dir is. This is useful when cross-compiling for other
# architecture, to avoid pkg-config using the system .pc files and mixing host and target libraries
export PKG_CONFIG_LIBDIR=${SYSROOT}/usr/lib
# This could have been used for cross compiling:
#export PKG_CONFIG_SYSROOT_DIR=${SYSROOT}
Man in the middle proxy for WebKit
Sometimes it’s useful to use our own modified/unminified files with a 3rd party service we don’t control. Mitmproxy can be used as a man-in-the-middle proxy, but I haven’t tried it personally yet. What I have tried (with WPE) is this:
Add an /etc/hosts entry to point the host serving the files we want to change to an IP address controlled by us.
Configure a web server to provide the files in the expected path.
:bulb: Pro tip: If you have to debug minified/obfuscated JavaScript code and don’t have a deobfuscated version to use in a man-in-the-middle fashion, use http://www.jsnice.org/ to deobfuscate it and get meaningful variable names.
Bandwidth control for a dependent device
If your computer has a “shared internet connection” enabled in Network Manager and provides access to a dependent device , you can control the bandwidth offered to that device. This is useful to trigger quality changes on adaptive streaming videos from services out of your control.
This can be done using tc, the Traffic Control tool from the Linux kernel. You can use this script to automate the process (edit it to suit to your needs).
Useful scripts to process GStreamer logs
I use these scripts in my daily job to look for strange patterns in GStreamer logs that help me to find the cause of the bugs I’m debugging:
h: Highlights each expression in the command line in a different color.
mgrep: Greps (only) for the lines with the expressions in the command line and highlights each expression in a different color.
filter-time: Gets a subset of the log lines between a start and (optionally) an end GStreamer log timestamp.
highlight-threads: Highlights each thread in a GStreamer log with a different color. That way it’s easier to follow a thread with the naked eye.
remove-ansi-colors: Removes the color codes from a colored GStreamer log.
aha: ANSI-HTML-Adapter converts plain text with color codes to HTML, so you can share your GStreamer logs from a web server (eg: for bug discussion). Available in most distros.
gstbuffer-leak-analyzer: Analyzes a GStreamer log and shows unbalances in the creation/destruction of GstBuffer and GstMemory objects.
In this new post series, I’ll show you how both existing and ad-hoc tools can be helpful to find the root cause of some problems. Here are also the older posts of this series in case you find them useful:
Use strace to know which config/library files are used by a program
If you’re becoming crazy supposing that the program should use some config and it seems to ignore it, just use strace to check what config files, libraries or other kind of files is the program actually using. Use the grep rules you need to refine the search:
First, try to strace -e trace=signal -p 1234 the killed process.
If that doesn’t work (eg: because it’s being killed with the uncatchable SIGKILL signal), then you can resort to modifying the kernel source code (signal.c) to log the calls to kill():
If you ever find yourself with little time in front of a stubborn build system and, no matter what you try, you can’t get the right flags to the compiler, think about putting something (a wrapper) between the build system and the compiler. Example for g++:
#!/bin/bash
main() {
# Build up arg[] array with all options to be passed
# to subcommand.
i=0
for opt in "$@"; do
case "$opt" in
-O2) ;; # Removes this option
*)
arg[i]="$opt" # Keeps the others
i=$((i+1))
;;
esac
done
EXTRA_FLAGS="-O0" # Adds extra option
echo "g++ ${EXTRA_FLAGS} ${arg[@]}" # >> /tmp/build.log # Logs the command
/usr/bin/ccache g++ ${EXTRA_FLAGS} "${arg[@]}" # Runs the command
}
main "$@"
Make sure that the wrappers appear earlier than the real commands in your PATH.
The make wrapper can also call remake instead. Remake is fully compatible with make but has features to help debugging compilation and makefile errors.
The source code shown below must be placed in the .h where the class to be debugged is defined. It’s written in a way that doesn’t need to rebuild RefCounted.h, so it saves a lot of build time. It logs all refs, unrefs and adoptPtrs, so that any anomaly in the refcounting can be traced and investigated later. To use it, just make your class inherit from LoggedRefCounted instead of RefCounted.
Example output:
void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
^^^ Two adopts, this is not good.
void WTF::LoggedRefCounted<T>::ref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
void WTF::LoggedRefCounted<T>::ref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount ... --> 2
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 2 --> ...
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount ... --> 1
void WTF::adopted(WTF::LoggedRefCounted<T>*) [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
void WTF::LoggedRefCounted<T>::deref() [with T = WebCore::MediaSourceClientGStreamerMSE]: this=0x673c07a4, refCount 1 --> ...
^^^ Two recursive derefs, not good either.
It only works #if ENABLE(DEVELOPER_MODE), so you might want to remove those ifdefs if you’re building in Release mode.
Log tracers
In big pipelines (e.g. playbin) it can be very hard to find what element is replying to a query or handling an event. Even using gdb can be extremely tedious due to the very high level of recursion. My coworker Alicia commented that using log tracers is more helpful in this case.
GST_TRACERS=log enables additional GST_TRACE() calls all accross GStreamer. The following example logs entries and exits into the query function.
GST_TRACERS=log GST_DEBUG='query:TRACE'
The names of the logging categories are somewhat inconsistent:
log (the log tracer itself)
GST_BUFFER
GST_BUFFER_LIST
GST_EVENT
GST_MESSAGE
GST_STATES
GST_PADS
GST_ELEMENT_PADS
GST_ELEMENT_FACTORY
query
bin
The log tracer code is in subprojects/gstreamer/plugins/tracers/gstlog.c.
The thread id is generated by Linux and can take values higher than 1-9, just like PIDs. This thread number is useful to know which function calls are issued by the same thread, avoiding confusion between threads.
And use it like this in all the functions you want to trace:
void somefunction() {
MYTRACER();
// Some other code...
}
The constructor will log when the execution flow enters into the function and the destructor will log when the flow exits.
Setting breakpoints from C
In the C code, just call raise(SIGINT) (simulate CTRL+C, normally the program would finish).
And then, in a previously attached gdb, after breaking and having debugging all you needed, just continue the execution by ignoring the signal or just plainly continuing:
(gdb) signal 0
(gdb) continue
There’s a way to do the same but attaching gdb after the raise. Use raise(SIGSTOP) instead (simulate CTRL+Z). Then attach gdb, locate the thread calling raise and switch to it:
(gdb) thread apply all bt
[now search for "raise" in the terminal log]
Thread 36 (Thread 1977.2033): #1 0x74f5b3f2 in raise () from /home/enrique/buildroot/output2/staging/lib/libpthread.so.0
(gdb) thread 36
Now, from a terminal, send a continuation signal: kill -SIGCONT 1977. Finally instruct gdb to single-step only the current thread (IMPORTANT!) and run some steps until all the raises have been processed:
(gdb) set scheduler-locking on
(gdb) next // Repeat several times...
Know the name of a GStreamer function stored in a pointer at runtime
Just use this macro:
GST_DEBUG_FUNCPTR_NAME(func)
Detecting memory leaks in WebKit
RefCountedLeakCounter is a tool class that can help to debug reference leaks by printing this kind of messages when WebKit exits:
This is the continuation of the GStreamer WebKit debugging tricks post series. In the next three posts, I’ll focus on what we can get by doing some little changes to the source code for debugging purposes (known as “instrumenting”), but before, you might want to check the previous posts of the series:
Know all the env vars read by a program by using LD_PRELOAD to intercept libc calls
// File getenv.c
// To compile: gcc -shared -Wall -fPIC -o getenv.so getenv.c -ldl
// To use: export LD_PRELOAD="./getenv.so", then run any program you want
// See http://www.catonmat.net/blog/simple-ld-preload-tutorial-part-2/
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
// This function will take the place of the original getenv() in libc
char *getenv(const char *name) {
printf("Calling getenv(\"%s\")\n", name);
char *(*original_getenv)(const char*);
original_getenv = dlsym(RTLD_NEXT, "getenv");
return (*original_getenv)(name);
}
See the breakpoints with command example to know how to get the same using gdb. Check also Zan’s libpine for more features.
Track lifetime of GObjects by LD_PRELOADing gobject-list
The gobject-list project, written by Thibault Saunier, is a simple LD_PRELOAD library for tracking the lifetime of GObjects. When loaded into an application, it prints a list of living GObjects on exiting the application (unless the application crashes), and also prints reference count data when it changes. SIGUSR1 or SIGUSR2 can be sent to the application to trigger printing of more information.
Overriding the behaviour of a debugging macro
The usual debugging macros aren’t printing messages? Redefine them to make what you want:
First arguments (channel, msg) are captured intependently
The remaining args are captured in __VA_ARGS__
do while(false) is a trick to avoid {braces} and make the code block work when used in if/then/else one-liners
#channel expands LOG(MyChannel,....) as printf("%s: ", "MyChannel"). It’s called “stringification”.
## __VA_ARGS__ expands the variable argument list as a comma-separated list of items, but if the list is empty, it eats the comma after “msg”, preventing syntax errors
Print the compile-time type of an expression
Use typeid(<expression>).name(). Filter the ouput through c++filt -t:
Abusing the compiler to know all the places where a function is called
If you want to know all the places from where the GstClockTime toGstClockTime(float time) function is called, you can convert it to a template function and use static_assert on a wrong datatype like this (in the .h):
Note that T=float is different to integer (is_integral). It has nothing to do with the float time parameter declaration.
You will get compile-time errors like this on every place the function is used:
WebKitMediaSourceGStreamer.cpp:474:87: required from here
GStreamerUtilities.h:84:43: error: static assertion failed: Don't call toGstClockTime(float)!
Use pragma message to print values at compile time
Sometimes is useful to know if a particular define is enabled:
#include <limits.h>
#define _STR(x) #x
#define STR(x) _STR(x)
#pragma message "Int max is " STR(INT_MAX)
#ifdef WHATEVER
#pragma message "Compilation goes by here"
#else
#pragma message "Compilation goes by there"
#endif
...
The code above would generate this output:
test.c:6:9: note: #pragma message: Int max is 0x7fffffff
#pragma message "Int max is " STR(INT_MAX)
^~~~~~~
test.c:11:9: note: #pragma message: Compilation goes by there
#pragma message "Compilation goes by there"
^~~~~~~
This post is a continuation of a series of blog posts about the most interesting debugging tricks I’ve found while working on GStreamer WebKit on embedded devices. These are the other posts of the series published so far:
Sometimes the symbol names aren’t printed in the stack memdump. You can do this trick to iterate the stack and print the symbols found there (take with a grain of salt!):
(gdb) set $i = 0
(gdb) p/a *((void**)($sp + 4*$i++))
[Press ENTER multiple times to repeat the command]
$46 = 0xb6f9fb17 <_dl_lookup_symbol_x+250>
$58 = 0xb40a9001 <g_log_writer_standard_streams+128>
$142 = 0xb40a877b <g_return_if_fail_warning+22>
$154 = 0xb65a93d5 <WebCore::MediaPlayerPrivateGStreamer::changePipelineState(GstState)+180>
$164 = 0xb65ab4e5 <WebCore::MediaPlayerPrivateGStreamer::playbackPosition() const+420>
...
Many times it’s just a matter of gdb not having loaded the unstripped version of the library. /proc/<PID>/smaps and info proc mappings can help to locate the library providing the missing symbol. Then we can load it by hand.
For instance, for this backtrace:
#0 0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6
#1 0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0
#2 0x6cfd0d60 in ?? ()
In a shell, we examine smaps and find out that the unknown piece of code comes from libgstomx:
Now we load the unstripped .so in gdb and we’re able to see the new symbol afterwards:
(gdb) add-symbol-file /home/enrique/buildroot-wpe/output/build/gst-omx-custom/omx/.libs/libgstomx.so 0x6cfc1000
(gdb) bt
#0 0x740ad3fc in syscall () from /home/enrique/buildroot-wpe/output/staging/lib/libc.so.6
#1 0x74375c44 in g_cond_wait () from /home/enrique/buildroot-wpe/output/staging/usr/lib/libglib-2.0.so.0
#2 0x6cfd0d60 in gst_omx_video_dec_loop (self=0x6e0c8130) at gstomxvideodec.c:1311
#3 0x6e0c8130 in ?? ()
Useful script to prepare the add-symbol-file:
cat /proc/715/smaps | grep '[.]so' | sed -e 's/-[0-9a-f]*//' | { while read ADDR _ _ _ _ LIB; do echo "add-symbol-file $LIB 0x$ADDR"; done; }
The “figuring out corrupt ARM stacktraces” post has some additional info about how to use addr2line to translate memory addresses to function names on systems with a hostile debugging environment.
Debugging a binary without debug symbols
There are times when there’s just no way to get debug symbols working, or where we’re simply debugging on a release version of the software. In those cases, we must directly debug the assembly code. The gdb text user interface (TUI) can be used to examine the disassebled code and the CPU registers. It can be enabled with these commands:
layout asm
layout regs
set print asm-demangle
Some useful keybindings in this mode:
Arrows: scroll the disassemble window
CTRL+p/n: Navigate history (previously done with up/down arrows)
CTRL+b/f: Go backward/forward one character (previously left/right arrows)
CTRL+d: Delete character (previously “Del” key)
CTRL+a/e: Go to the start/end of the line
This screenshot shows how we can infer that an empty RefPtr is causing a crash in some WebKit code.
Wake up an unresponsive gdb on ARM
Sometimes, when you continue (‘c’) execution on ARM there’s no way to stop it again unless a breakpoint is hit. But there’s a trick to retake the control: just send a harmless signal to the process.
kill -SIGCONT 1234
Know which GStreamer thread id matches with each gdb thread
Sometimes you need to match threads in the GStreamer logs with threads in a running gdb session. The simplest way is to ask it to GThread for each gdb thread:
(gdb) set output-radix 16
(gdb) thread apply all call g_thread_self()
This will print a list of gdb threads and GThread*. We only need to find the one we’re looking for.
Generate a pipeline dump from gdb
If we have a pointer to the pipeline object, we can call the function that dumps the pipeline:
I’ve been developing and debugging desktop and mobile applications on embedded devices over the last decade or so. The main part of this period I’ve been focused on the multimedia side of the WebKit ports using GStreamer, an area that is a mix of C (glib, GObject and GStreamer) and C++ (WebKit).
Over these years I’ve had to work on ARM embedded devices (mobile phones, set-top-boxes, Raspberry Pi using buildroot) where most of the environment aids and tools we take for granted on a regular x86 Linux desktop just aren’t available. In these situations you have to be imaginative and find your own way to get the work done and debug the issues you find in along the way.
I’ve been writing down the most interesting tricks I’ve found in this journey and I’m sharing them with you in a series of 7 blog posts, one per week. Most of them aren’t mine, and the ones I learnt in the begining of my career can even seem a bit naive, but I find them worth to share anyway. I hope you find them as useful as I do.
Breakpoints with command
You can break on a place, run some command and continue execution. Useful to get logs:
break getenv
command
# This disables scroll continue messages
# and supresses output
silent
set pagination off
p (char*)$r0
continue
end
break grl-xml-factory.c:2720 if (data != 0)
command
call grl_source_get_id(data->source)
# $ is the last value in the history, the result of
# the previous call
call grl_media_set_source (send_item->media, $)
call grl_media_serialize_extended (send_item->media,
GRL_MEDIA_SERIALIZE_FULL)
continue
end
Nearly five years ago, when I was in grad school, I stumbled across the paper Collaboration in the open-source arena: The WebKit case when trying to figure out what I would do for a course project in network theory (i.e. graph theory, not computer networking; I’ll use the words “graph” and “network” interchangeably). The paper evaluates collaboration networks, which are graphs where collaborators are represented by nodes and relationships between collaborators are represented by edges. Our professor had used collaboration networks as examples during lecture, so it seemed at least mildly relevant to our class, and I wound up writing a critique on this paper for the class project. In this paper, the authors construct collaboration networks for WebKit by examining the project’s changelog files to define relationships between developers. They perform “community detection” to visually group developers who work closely together into separate clusters in the graphs. Then, the authors use those graphs to arrive at various conclusions about WebKit (e.g. “[e]ven if Samsung and Apple are involved in expensive patent wars in the courts and stopped collaborating on hardware components, their contributions remained strong and central within the WebKit open source project,” regarding the period from 2008 to 2013).
At the time, I contacted the authors to let them know about some serious problems I found with their work. Then I left the paper sitting in a short-term to-do pile on my desk, where it has been sitting since Obama was president, waiting for me to finally write this blog post. Unfortunately, nearly five years later, the authors’ email addresses no longer work, which is not very surprising after so long — since I’m no longer a student, the email I originally used to contact them doesn’t work anymore either — so I was unable to contact them again to let them know that I was finally going to publish this blog post. Anyway, suffice to say that the conclusions of the paper were all correct; however, the networks used to arrive at those conclusions suffered from three different mistakes, each of which was, on its own, serious enough to invalidate the entire work.
So if the analysis of the networks was bogus, how did the authors arrive at correct conclusions anyway? The answer is confirmation bias. The study was performed by visually looking at networks and then coming to non-rigorous conclusions about the networks, and by researching the WebKit community to learn what is going on with the major companies involved in the project. The authors arrived at correct conclusions because they did a good job at the latter, then saw what they wanted to see in the graphs.
I don’t want to be too harsh on the authors of this paper, though, because they decided to publish their raw data and methodology on the internet. They even published the python scripts they used to convert WebKit changelogs into collaboration graphs. Had they not done so, there is no way I would have noticed the third (and most important) mistake that I’ll discuss below, and I wouldn’t have been able to confirm my suspicions about the second mistake. You would not be reading this right now, and likely nobody would ever have realized the problems with the paper. The authors of most scientific papers are not nearly so transparent: many researchers today consider their source code and raw data to be either proprietary secrets to be guarded, or simply not important enough to merit publication. The authors of this paper deserve to be commended, not penalized, for their openness. Mistakes are normal in research papers, and open data is by far the best way for us to be able to detect mistakes when they happen.
A collaboration network from the paper. The paper reports that this network represents collaboration between September 2008 (when Google began contributing to WebKit) and February 2011 (the departure of Nokia from the project). Because the authors posted their data online, I noticed that this was a mistake in the paper: the graph actually represents the period between February 2011 and July 2012. The paper’s analysis of this graph is therefore questionable, but note this was only a minor mistake compared to the three major mistakes that impact this network. Note the suspiciously-high number of unaffiliated (“Other”) contributors in a corporate-dominated project.
The rest of this blog post is a simplified version of my original school paper from 2016. I’ve removed maybe half the original content, including some flowery academic language and unnecessary references to class material (e.g. “community detection was performed using fast modularity maximization to generate an alternate visualization of the network,” good for high scores on class papers, not so good for blog posts). But rewriting everything to be informal takes a long time, and I want to finish this today so it’s not still on my desk five more years from now, so the rest of this blog post is still going to be much more formal than normal. Oh well. Tone shift now!
We (“we” means “I”) examine various methodological issues discovered by analyzing the paper. The first section discusses the effects on the collaboration network of choosing a poor definition of collaboration. The second section discusses a major source of error in detecting the company affiliation of many contributors. The third section describes a serious mistake in the data collection process. Each of these issues is quite severe, and any one alone calls into question the validity of the entire study. It must be noted that such issues are not necessarily unique to this paper, and must be kept in mind for all future studies that utilize collaboration networks.
Mistake #1: Poorly-defined Collaboration
The precise definition used to model collaboration has tremendous impact on the usefulness of the resultant collaboration network. Many collaboration networks are built using definitions of collaboration that are self-evidently useful, where there is little doubt that edges in the network represent real-world collaboration. The paper adopts an approach to building collaboration networks where developers are represented by nodes, and an edge exists between two nodes if the corresponding developers modified the same source code file during the time period under consideration for the construction of the network. However, it is not clear that this definition of collaboration is actually useful. Consider that it is a regular occurrence for developers who do not know each other and may have never communicated to modify the same files. Consider also that modifying a common file does not necessarily reflect any shared interest in a particular portion of the software project. For instance, a file might be modified when making an interface change in another file, or when fixing a build error occurring on a particular platform. Such occurrences are, in fact, extremely common in the WebKit project. Additionally, consider that there exist particular source code files that are unusually central to the project, and must be modified more frequently than other files. It is highly likely that almost all developers will at one point or another make some change in such a file, and therefore be connected via a collaboration edge to all other developers who have ever modified that file. (My original critique shows a screenshot of the revision history of WebPageProxy.cpp, to demonstrate that the developers modifying this file were working on unrelated projects.)
It is true, as assumed by the paper, that particular developers work on different portions of the WebKit source code, and collaborate more with particular other developers. For instance, developers who work for the same company typically, though not always, collaborate most with other developers from that same company. However, the paper’s naive definition of collaboration should ensure that most developers will be considered to have collaborated equally with most other developers, regardless of the actual degree of collaboration. For instance, consider developers A and B who regularly collaborate on a particular source file. Now, developer C, who works on a platform that does not use this file and would not ordinarily need to modify it, makes a change to some cross-platform interface in another file that requires updating this file. Developer C is now considered to have collaborated with developers A and B on this file! Clearly, this is not a desirable result, as developers A and B have collaborated far more on the development of the file. Moreover, consider that an edge exists between two developers in the collaboration network if they have ever both modified any file anywhere in WebKit during the time period under review; then we can expect to form a network that is almost complete (a “full” graph where edges exists between most nodes). It is evident that some method of weighting collaboration between different contributors would be desirable, as the unweighted collaboration network does not seem useful.
One might argue that the networks presented in the paper clearly show developers exist in subcommunities on the peripheries of the network, that the network is clearly not complete, and that therefore this definition of collaboration sufficed, at least to some extent. However, this is only due to another methodological error in the study. Mistake #3, discussed later, explains how the study managed to produce collaboration networks with noticeable subcommunities despite these issues.
We note that the authors chose this same definition of collaboration in their more recent work on OpenStack, so there exist multiple studies using this same flawed definition of collaboration. We speculate that this definition of collaboration is unlikely to be more suitable for OpenStack or for other software projects than it is for WebKit. The software engineering research community must explore alternative models of collaboration when undertaking future studies of software development collaboration networks in order to more accurately reflect collaboration.
Mistake #2: Misdetected Contributor Affiliation
One difficulty when building collaboration networks is the need to correctly match each contributor with the correct company affiliation. Although many free software projects are dominated by unaffiliated contributors, others, like WebKit, are primarily developed by paid contributors. Looking at the number of times a particular email domain appears in WebKit changelog entries made during 2015, most contributors commit using corporate emails, but many developers commit to WebKit using personal email accounts, such as Gmail accounts; additionally, many developers use generic webkit.org email aliases, which were previously available to active WebKit contributors. These developers may or may not be affiliated with companies that contribute to the project. Use of personal email addresses is a source of inaccuracy when constructing collaboration networks, as it results in an undercount of corporate contributions. We can expect this issue has led to serious inaccuracies in the reported collaboration networks.
This substantial source of error is neither mentioned nor accounted for; all contributors using such email accounts were therefore miscategorized as unaffiliated. However, the authors clearly recognized this issue, as it has been accounted for in their more recent work covering OpenStack by cross-referencing email addresses from git revision history with a database containing corporate affiliations maintained by the OpenStack Foundation. Unfortunately, no such effort was made for the WebKit data set.
The WebKit project was previously dominated by contributors with chromium.org email domains. This domain is equivalent to webkit.org in that it can be used by contributors to the Chromium project regardless of corporate affiliation; however, most contributors with Chromium emails are actually Google employees. The high use of Chromium emails by Google employees appears to have led to a dramatic — by roughly an entire order of magnitude — undercount of Google’s contributors to the WebKit project, as only contributors with google.com emails were considered to be Google employees. The vast majority of Google employees used chromium.org emails, and so were counted as unaffiliated developers. This explains the extraordinarily high number of unaffiliated developers in the networks presented by the paper, despite the fact that WebKit development is, in reality, dominated by corporate contributors.
Mistake #3: Missing Most Changelog Data
The paper incorrectly claims to have gathered its data from both WebKit’s Subversion revision history and from its changelog files. We must draw a distinction between changelog entries and Subversion revision history. Changelog entries are inserted into changelog files that are committed into the Subversion repository; they are completely separate from the Subversion history. Each subproject within the WebKit project has its own set of changelog files used to record changes under the corresponding directory.
In fact, the paper processed only the changelog files. This was actually a good choice, as WebKit’s changelog files are much more accurate than the Subversion history, for two reasons. Firstly, it is easy for a contributor to change the email address entered into a changelog file, e.g. after a change in company affiliation. However, it is difficult to change the email address used to commit to Subversion, as this requires requesting a new Subversion account from the Subversion administrator; accordingly, contributors are more likely to use older email addresses, lacking accurate company affiliation, in Subversion revisions than in changelog files. Secondly, many Subversion revisions are not directly committed by contributors, but rather are actually committed by the commit queue bot, which runs various tests before committing the revision. Subversion revisions are also, more rarely, committed by a completely different contributor than the patch author. In both cases, the proper contributor’s name will appear in only the changelog file, and not the Subversion data. Some developers are dramatically more likely to use the commit queue than others. Various other reviews of WebKit contribution history that examine data from Subversion history rather than from changelog files are flawed for this reason. Fortunately, by relying on changelog files rather than Subversion metadata, the authors avoid this problem.
Unfortunately, a serious error was made in processing the changelog data. WebKit has many different sets of changelog files, stored in various project subdirectories (JavaScriptCore, WebCore, WebKit, etc.), as well as toplevel changelogs stored in the root directory of the project. Regrettably, the authors were unaware of the changelogs in subdirectories, and based their analysis only on the toplevel changelogs, which contain only changes that occurred in subdirectories that lack their own changelog files. In practice, this inadvertently restricted the scope of the analysis to a very small minority of changes, primarily to build system files, manual tests, and the WebKit website. That is, the reported collaboration networks do not reflect collaboration on any actual source code files. All source code files are contained in subdirectories with their own changelog files, and therefore no source code files were actually considered in the analysis of collaboration on source code changes.
We speculate that the analysis’s focus on build system files likely exaggerates the effects of clustering in the network, as different companies used different build systems and thus were less likely to edit the build systems used by other companies, and that an analysis based on the correct data would display less of a clustering effect. Certainly, there would be dramatically more edges in the already-dense networks, because an edge exists between two developers if there exists any one file in WebKit that both developers have modified. Omitting all of the source code files from the analysis therefore dramatically reduces the likelihood of edges existing between nodes in the network.
Conclusion
We found that the original study was impacted by an unsuitable definition of collaboration used to build the collaboration networks, severe errors in counting contributor affiliation (including the classification of most Google employees as unaffiliated developers), and the omission of almost all the required data from the analysis, including all data on modifications to source code files. The authors constructed and studied essentially meaningless networks. Nevertheless, the authors were able to derive many accurate conclusions about the WebKit project from their inaccurate collaboration networks. Such conclusions illustrate the dangers of seeking to find particular meanings or explanations through visual inspection of collaboration networks. Researchers must work forwards from the collaboration networks to arrive at their conclusions, rather than backwards by attempting to match the networks to conclusions gained from prior knowledge.
Original Report
Wow, OK, you actually read this far? Since this blog post criticizes an academic paper, and since this blog post does not include various tables and examples that support my arguments, I’ve attached my original analysis in full. It is a boring, student-quality grad school project written with the objective of scoring the highest-possible grade in a class rather than for clarity, and you probably don’t want to look at it unless you are investigating the paper in detail. (If you download that, note that I no longer work for Igalia, and the paper was not authorized by Igalia either; I used my company email to disclose my affiliation and maybe impress my professor a little.) Phew, now I can finally remove this from my desk!
Over the past few months the WebKit development team has been working on
modernizing support for the WebAudio specification. This post highlights some
of the changes that were recently merged, focusing on the GStreamer ports.
My fellow WebKit colleague, Chris Dumez, has been very active lately, updating
the WebAudio implementation …
In this line of work, we all stumble at least once upon a problem
that turns out to be extremely elusive and very tricky to narrow down
and solve. If we&aposre lucky, we might have everything at our
disposal to diagnose the problem but sometimes that&aposs not the
case – and in embedded development it&aposs often not the
case. Add to the mix proprietary drivers, lack of debugging symbols, a
bug that&aposs very hard to reproduce under a controlled environment,
and weeks in partial confinement due to a pandemic and what you have
is better described as a very long lucid nightmare. Thankfully,
even the worst of nightmares end when morning comes, even if sometimes
morning might be several days away. And when the fix to the problem is
in an inimaginable place, the story is definitely one worth
telling.
The problem
It all started with one
of Igalia&aposs customers deploying
a WPE WebKit-based browser in
their embedded devices. Their CI infrastructure had detected a problem
caused when the browser was tasked with creating a new webview (in
layman terms, you can imagine that to be the same as opening a new tab
in your browser). Occasionally, this view would never load, causing
ongoing tests to fail. For some reason, the test failure had a
reproducibility of ~75% in the CI environment, but during manual
testing it would occur with less than a 1% of probability. For reasons
that are beyond the scope of this post, the CI infrastructure was not
reachable in a way that would allow to have access to running
processes in order to diagnose the problem more easily. So with only
logs at hand and less than a 1/100 chances of reproducing the bug
myself, I set to debug this problem locally.
Diagnosis
The first that became evident was that, whenever this bug would
occur, the WebKit feature known as web extension (an
application-specific loadable module that is used to allow the program
to have access to the internals of a web page, as well to enable
customizable communication with the process where the page contents
are loaded – the web process) wouldn&apost work. The browser would be
forever waiting that the web extension loads, and since that wouldn&apost
happen, the expected page wouldn&apost load. The first place to look into
then is the web process and to try to understand what is preventing
the web extension from loading. Enter here, our good friend GDB, with
less than spectacular results thanks to stripped libraries.
#0 0x7500ab9c in poll () from target:/lib/libc.so.6
#1 0x73c08c0c in ?? () from target:/usr/lib/libEGL.so.1
#2 0x73c08d2c in ?? () from target:/usr/lib/libEGL.so.1
#3 0x73c08e0c in ?? () from target:/usr/lib/libEGL.so.1
#4 0x73bold6a8 in ?? () from target:/usr/lib/libEGL.so.1
#5 0x75f84208 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#6 0x75fa0b7e in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#7 0x7561eda2 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#8 0x755a176a in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#9 0x753cd842 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#10 0x75451660 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#11 0x75452882 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#12 0x75452fa8 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#13 0x76b1de62 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#14 0x76b5a970 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#15 0x74bee44c in g_main_context_dispatch () from target:/usr/lib/libglib-2.0.so.0
#16 0x74bee808 in ?? () from target:/usr/lib/libglib-2.0.so.0
#17 0x74beeba8 in g_main_loop_run () from target:/usr/lib/libglib-2.0.so.0
#18 0x76b5b11c in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#19 0x75622338 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#20 0x74f59b58 in __libc_start_main () from target:/lib/libc.so.6
#21 0x0045d8d0 in _start ()
From all threads in the web process, after much tinkering around it
slowly became clear that one of the places to look into is
that poll()
call. I will spare you the details related to what other threads were
doing, suffice to say that whenever the browser would hit the bug,
there was a similar stacktrace in one thread, going
through libEGL to a
call to poll() on top of the stack, that would never
return. Unfortunately, a stripped EGL driver coming from a proprietary
graphics vendor was a bit of a showstopper, as it was the inability to
have proper debugging symbols running inside the device (did you know
that a non-stripped WebKit library binary with debugging symbols can
easily get GDB and your device out of memory?). The best one could do
to improve that was to use the
gcore
feature in GDB, and extract a core from the device for post-mortem
analysis. But for some reason, such a stacktrace wouldn&apost give
anything interesting below the poll() call to understand
what&aposs being polled here. Did I say this was tricky?
What polls?
Because WebKit is a multiprocess web engine, having system calls
that signal, read, and write in sockets communicating with other
processes is an everyday thing. Not knowing what a poll()
call is doing and who is it that it&aposs trying to listen to, not
very good. Because the call is happening under the EGL library, one
can presume that it&aposs graphics related, but there are still
different possibilities, so trying to find out what is this polling is
a good idea.
A trick I learned while debugging this is that, in absence of
debugging symbols that would give a straightforward look into
variables and parameters, one can examine the CPU registers and try to
figure out from them what the parameters to function calls are. Let&aposs
do that with poll(). First, its signature.
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
Now, let's examine the registers.
(gdb) f 0
#0 0x7500ab9c in poll () from target:/lib/libc.so.6
(gdb) info registers
r0 0x7ea55e58 2124766808
r1 0x1 1
r2 0x64 100
r3 0x0 0
r4 0x0 0
Registers r0, r1, and r2
contain poll()&aposs three
parameters. Because r1 is 1, we know that there is only
one file descriptor being polled. fds is a pointer to an
array with one element then. Where is that first element? Well, right
there, in the memory pointed to directly by
r0. What does struct pollfd look like?
struct pollfd {
int fd; /* file descriptor */
short events; /* requested events */
short revents; /* returned events */
};
What we are interested in here is the contents of fd,
the file descriptor that is being polled. Memory alignment is again in
our side, we don&apost need any pointer arithmetic here. We can
inspect directly the register r0 and find out what the
value of fd is.
(gdb) print *0x7ea55e58
$3 = 8
So we now know that the EGL library is polling the file descriptor
with an identifier of 8. But where is this file descriptor coming
from? What is on the other end? The /proc file system can
be helpful here.
# pidof WPEWebProcess
1944 1196
# ls -lh /proc/1944/fd/8
lrwx------ 1 x x 64 Oct 22 13:59 /proc/1944/fd/8 -> socket:[32166]
So we have a socket. What else can we find out about it? Turns out,
not much without
the unix_diag
kernel module, which was not available in our device. But we are
slowly getting closer. Time to call another good friend.
Where GDB fails, printf() triumphs
Something I have learned from many years working with a project as
large as WebKit, is that debugging symbols can be very difficult to
work with. To begin with, it takes ages to build WebKit with them.
When cross-compiling, it&aposs even worse. And then, very often the
target device doesn&apost even have enough memory to load the symbols
when debugging. So they can be pretty useless. It&aposs then when
just
using fprintf()
and logging useful information can simplify things. Since we know that
it&aposs at some point during initialization of the web process that
we end up stuck, and we also know that we&aposre polling a file
descriptor, let&aposs find some early calls in the code of the web
process and add some
fprintf() calls with a bit of information, specially in
those that might have something to do with EGL. What can we find out
now?
Oct 19 10:13:27.700335 WPEWebProcess[92]: Starting
Oct 19 10:13:27.720575 WPEWebProcess[92]: Initializing WebProcess platform.
Oct 19 10:13:27.727850 WPEWebProcess[92]: wpe_loader_init() done.
Oct 19 10:13:27.729054 WPEWebProcess[92]: Initializing PlatformDisplayLibWPE (hostFD: 8).
Oct 19 10:13:27.730166 WPEWebProcess[92]: egl backend created.
Oct 19 10:13:27.741556 WPEWebProcess[92]: got native display.
Oct 19 10:13:27.742565 WPEWebProcess[92]: initializeEGLDisplay() starting.
Two interesting findings from the fprintf()-powered
logging here: first, it seems that file descriptor 8 is one known to
libwpe
(the general-purpose library that powers the WPE WebKit port). Second,
that the last EGL API call right before the web process hangs
on poll() is a call
to eglInitialize(). fprintf(),
thanks for your service.
Number 8
We now know that the file descriptor 8 is coming from WPE and is
not internal to the EGL library. libwpe gets this file descriptor from
the UI process,
as one
of the many creation parameters that are passed via IPC to the
nascent process in order to initialize it. Turns out that this file
descriptor in particular, the so-called host client file descriptor,
is the one that the freedesktop backend of libWPE, from here onwards
WPEBackend-fdo,
creates when a new client is set to connect to its Wayland display. In
a nutshell, in presence of a new client, a Wayland display is supposed
to create a pair of connected sockets, create a new client on the
Display-side, give it one of the file descriptors, and pass the other
one to the client process. Because this will be useful later on,
let&aposs see how is
that currently
implemented in WPEBackend-fdo.
int pair[2];
if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, pair) 0)
return -1;
int clientFd = dup(pair[1]);
close(pair[1]);
wl_client_create(m_display, pair[0]);
The file descriptor we are tracking down is the client file
descriptor, clientFd. So we now know what&aposs going on in this socket:
Wayland-specific communication. Let&aposs enable Wayland debugging next,
by running all relevant process with WAYLAND_DEBUG=1. We&aposll get back
to that code fragment later on.
A Heisenbug is a Heisenbug is a Heisenbug
Turns out that enabling Wayland debugging output for a few
processes is enough to alter the state of the system in such a way
that the bug does not happen at all when doing manual
testing. Thankfully the CI&aposs reproducibility is much higher, so
after waiting overnight for the CI to continuously run until it hit
the bug, we have logs. What do the logs say?
WPEWebProcess[41]: initializeEGLDisplay() starting.
-> wl_display@1.get_registry(new id wl_registry@2)
-> wl_display@1.sync(new id wl_callback@3)
So the EGL library is trying to fetch the Wayland
registry and it&aposs doing a wl_display_sync() call
afterwards, which will block until the server responds. That&aposs
where the blocking poll() call comes from. So, it turns
out, the problem is not necessarily on this end of the Wayland socket,
but perhaps on the other side, that is, in the so-called UI process
(the main browser process). Why is the Wayland display not
replying?
The loop
Something that is worth mentioning before we move on is how the
WPEBackend-fdo Wayland display integrates with the system. This
display is a nested display, with each web view a client, while it is
itself a client of the system&aposs Wayland display. This can be a bit
confusing if you&aposre not very familiar with how Wayland works, but
fortunately there is
good
documentation about Wayland elsewhere.
The way that the Wayland display in the UI process of a WPEWebKit
browser is integrated with the rest of the program, when it uses
WPEBackend-fdo, is through the
GLib
main event loop. Wayland itself has an event loop implementation
for servers, but for a GLib-powered application it can be useful to
use GLib&aposs and integrate Wayland&aposs event processing with the different
stages of the GLib main loop. That is precisely how WPEBackend-fdo is
handling its clients&apos events. As discussed earlier, when a new client
is created a pair of connected sockets are created and one end is
given to Wayland to control communication with the
client. GSourceFunc
functions are used to integrate Wayland with the application main
loop. In these functions, we make sure that whenever there are pending
messages to be sent to clients, those are sent, and whenever any of
the client sockets has pending data to be read, Wayland reads from
them, and to dispatch the events that might be necessary in response
to the incoming data. And here is where things start getting really
strange, because after doing a bit of
fprintf()-powered debugging inside the Wayland-GSourceFuncs functions,
it became clear that the Wayland events from the clients were never
dispatched, because the dispatch()GSourceFunc was not being called,
as if there was nothing coming from any Wayland client. But how is
that possible, if we already know that the web process client is
actually trying to get the Wayland registry?
To move forward, one needs to understand how the GLib main loop
works, in particular, with Unix file descriptor sources. A very brief
summary of this is that, during an iteration of the main loop, GLib
will poll file descriptors to see if there are any interesting events
to be reported back to their respective sources, in which case the
sources will decide whether to trigger the dispatch()
phase. A simple source might decide in its dispatch()
method to directly read or write from/to the file descriptor; a
Wayland display source (as in our case), will
call wl_event_loop_dispatch() to do this for us.
However, if the source doesn&apost find any interesting events, or if
the source decides that it doesn&apost want to handle them,
the dispatch() invocation will not happen. More on the
GLib main event loop in
its API
documentation.
So it seems that for some reason the dispatch() method is not being
called. Does that mean that there are no interesting events to read
from? Let&aposs find out.
System call tracing
Here we resort to another helpful
tool, strace. With strace
we can try to figure out what is happening when the main loop polls
file descriptors. The strace output is huge (because it
takes easily over a hundred attempts to reproduce this), but we know
already some of the calls that involve file descriptors from the code
we looked at above, when the client is created. So we can use those
calls as a starting point in when searching through the several MBs of
logs. Fast-forward to the relevant logs.
What we see there is, first, WPEBackend-fdo creating a new socket
pair (128, 130) and then, when file descriptor 130 is passed to
wl_client_create() to
create a new client, Wayland adds that file descriptor to its
epoll() instance
for monitoring clients, which is referred to by file descriptor 34. This way, whenever there are
events in file descriptor 130, we will hear about them in file descriptor 34.
So what we would expect to see next is that, after the web process
is spawned, when a Wayland client is created using the passed file
descriptor and the EGL driver requests the Wayland registry from the
display, there should be a POLLIN event coming in file
descriptor 34 and, if the dispatch() call for the source
was called,
a epoll_wait()
call on it, as that is
what wl_event_loop_dispatch()
would do when called from the source&aposs dispatch()
method. But what do we have instead?
strace can be a bit cryptic, so let&aposs explain
those two function calls. The first one is a poll in a series of file
descriptors (including 30 and 34) for POLLIN events. The
return value of that call tells us that there is a POLLIN
event in file descriptor 34 (the Wayland display epoll()
instance for clients). But unintuitively, the call right after is
trying to read a message from socket 30 instead, which we know
doesn&apost have any pending data at the moment, and consequently
returns an error value with an errno
of EAGAIN (Resource temporarily unavailable).
Why is the GLib main loop triggering a read from 30 instead of 34?
And who is 30?
We can answer the latter question first. Breaking on a running UI
process instance at the right time shows who is reading from
the file descriptor 30:
#1 0x70ae1394 in wl_os_recvmsg_cloexec (sockfd=30, msg=msg@entry=0x700fea54, flags=flags@entry=64)
#2 0x70adf644 in wl_connection_read (connection=0x6f70b7e8)
#3 0x70ade70c in read_events (display=0x6f709c90)
#4 wl_display_read_events (display=0x6f709c90)
#5 0x70277d98 in pwl_source_check (source=0x6f71cb80)
#6 0x743f2140 in g_main_context_check (context=context@entry=0x2111978, max_priority=, fds=fds@entry=0x6165f718, n_fds=n_fds@entry=4)
#7 0x743f277c in g_main_context_iterate (context=0x2111978, block=block@entry=1, dispatch=dispatch@entry=1, self=)
#8 0x743f2ba8 in g_main_loop_run (loop=0x20ece40)
#9 0x00537b38 in ?? ()
So it&aposs also Wayland, but on a different level. This
is the Wayland client source (remember that the browser is also a
Wayland client?), which is installed
by cog (a thin browser
layer on top of WPE WebKit that makes writing browsers easier to do)
to process, among others, input events coming from the parent Wayland
display. Looking
at the cog code, we can see that the
wl_display_read_events()
call happens only if GLib reports that there is
a G_IO_IN
(POLLIN) event in its file descriptor, but we already
know that this is not the case, as per the strace
output. So at this point we know that there are two things here that
are not right:
A FD source with a G_IO_IN condition is not being dispatched.
A FD source without a G_IO_IN condition is being dispatched.
Someone here is not telling the truth, and as a result the main loop
is dispatching the wrong sources.
The loop (part II)
It is at this point that it would be a good idea to look at what
exactly the GLib main loop is doing internally in each of its stages
and how it tracks the sources and file descriptors that are polled and
that need to be processed. Fortunately, debugging symbols for GLib are
very small, so debugging this step by step inside the device is rather
easy.
Let&aposs look at how the main loop decides which sources
to dispatch, since for some reason it&aposs dispatching the wrong ones.
Dispatching happens in
the g_main_dispatch()
method. This method goes over a list of pending source dispatches and
after a few checks and setting the stage, the dispatch method for the
source gets called. How is a source set as having a pending dispatch?
This happens in
g_main_context_check(),
where the main loop checks the results of the polling done in this
iteration and runs the check() method for sources that
are not ready yet so that they can decide whether they are ready to be
dispatched or not. Breaking into the Wayland display source, I know
that
the check()
method is called. How does this method decide to be dispatched or
not?
In this lambda function we&aposre returning TRUE or
FALSE, depending on whether the revents
field in
the GPollFD
structure have been filled during the polling stage of this iteration
of the loop. A return value of TRUE indicates the main
loop that we want our source to be dispatched. From
the strace output, we know that there is a
POLLIN (or G_IO_IN) condition, but we also know that the main loop is
not dispatching it. So let&aposs look at what&aposs in this GPollFD structure.
For this, let&aposs go back to g_main_context_check() and inspect the array
of GPollFD structures that it received when called. What do we find?
That&aposs the result of the poll() call! So far so good. Now the method
is supposed to update the polling records it keeps and it uses when
calling each of the sources check() functions. What do these records
hold?
We&aposre not interested in the first record quite yet, but clearly
there&aposs something odd here. The polling records are showing a
different value in the revent fields for both 30 and 34. Are these
records updated correctly? Let&aposs look at the algorithm that is doing
this update, because it will be relevant later on.
pollrec = context->poll_records;
i = 0;
while (pollrec && i n_fds)
{
while (pollrec && pollrec->fd->fd == fds[i].fd)
{
if (pollrec->priority = max_priority)
{
pollrec->fd->revents =
fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
}
pollrec = pollrec->next;
}
i++;
}
In simple words, what this algorithm is doing is to traverse
simultaneously the polling records and the GPollFD array,
updating the polling records revents with the results of
polling. From
reading how
the pollrec linked list is built internally, it&aposs
possible to see that it&aposs purposely sorted by increasing file
descriptor identifier value. So the first item in the list will have
the record for the lowest file descriptor identifier, and so on. The
GPollFD array is also built in this way, allowing for a
nice optimization: if more than one polling record – that is, more
than one polling source – needs to poll the same file descriptor,
this can be done at once. This is why this otherwise O(n^2) nested
loop can actually be reduced to linear time.
One thing stands out here though: the linked list is only advanced
when we find a match. Does this mean that we always have a match
between polling records and the file descriptors that have just been
polled? To answer that question we need to check how is the array of
GPollFD structures
filled. This
is done in g_main_context_query(), as we hinted
before. I&aposll spare you the details, and just focus on what seems
relevant here: when is a poll record not used to fill
a GPollFD?
Interesting! If a polling record belongs to a source whose priority
is lower than the maximum priority that the current iteration is
going to process, the polling record is skipped. Why is this?
In simple terms, this happens because each iteration of the main
loop finds out the highest priority between the sources that are ready
in the prepare() stage, before polling, and then only
those file descriptor sources with at least such a a priority are
polled. The idea behind this is to make sure that high-priority
sources are processed first, and that no file descriptor sources with
lower priority are polled in vain, as they shouldn&apost be
dispatched in the current iteration.
GDB tells me that the maximum priority in this iteration is
-60. From an earlier GDB output, we also know that there&aposs a
source for a file descriptor 19 with a priority 0.
Since 19 is lower than 30 and 34, we know that this record is
before theirs in the linked list (and so it happens, it&aposs the
first one in the list too). But we know that, because its priority is
0, it is too low to be added to the file descriptor array to be
polled. Let&aposs look at the loop again.
pollrec = context->poll_records;
i = 0;
while (pollrec && i n_fds)
{
while (pollrec && pollrec->fd->fd == fds[i].fd)
{
if (pollrec->priority = max_priority)
{
pollrec->fd->revents =
fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
}
pollrec = pollrec->next;
}
i++;
}
The first polling record was skipped during the update of
the GPollFD array, so the condition pollrec
&& pollrec->fd->fd == fds[i].fd is never going to
be satisfied, because 19 is not in the array. The
innermost while() is not entered, and as such
the pollrec list pointer never moves forward to the next
record. So no polling record is updated here, even if we have
updated revent information from the polling results.
What happens next should be easy to see. The check()
method for all polled sources are called with
outdated revents. In the case of the source
for file descriptor 30, we wrongly tell it there&aposs a
G_IO_IN condition, so it asks the main loop to call
dispatch it triggering a a wl_connection_read() call in a
socket with no incoming data. For the source with file descriptor 34,
we tell it that there&aposs no incoming data and
its dispatch() method is not invoked, even when on the
other side of the socket we have a client waiting for data to come and
blocking in the meantime. This explains what we see in
the strace output above. If the source with file
descriptor 19 continues to be ready and with its priority unchanged,
then this situation repeats in every further iteration of the main
loop, leading to a hang in the web process that is forever waiting
that the UI process reads its socket pipe.
The bug – explained
I have been using GLib for a very long time, and I have only fixed
a couple of minor bugs in it over the years. Very few actually,
which is why it was very difficult for me to come to accept that I
had found a bug in one of the most reliable and complex parts of the
library. Impostor syndrome is a thing and it really gets in the way.
But in a nutshell, the bug in the GLib main loop is that the very
clever linear update of registers is missing something very important:
it should skip to the first polling record matching before attempting
to update its revents. Without this, in the presence of a
file descriptor source with the lowest file descriptor identifier and
also a lower priority than the cutting priority in the current main
loop iteration, revents in the polling registers are not
updated and therefore the wrong sources can be dispatched. The
simplest patch to avoid this, would look as follows.
i = 0;
while (pollrec && i n_fds)
{
+ while (pollrec && pollrec->fd->fd != fds[i].fd)
+ pollrec = pollrec->next;
+
while (pollrec && pollrec->fd->fd == fds[i].fd)
{
if (pollrec->priority = max_priority)
Once we find the first matching record, let&aposs update all consecutive
records that also match and need an update, then let&aposs skip to the
next record, rinse and repeat. With this two-line patch, the web
process was finally unlocked, the EGL display initialized properly,
the web extension and the web page were loaded, CI tests starting
passing again, and this exhausted developer could finally put his mind
to rest.
A complete
patch, including improvements to the code comments around this
fascinating part of GLib and also a minimal test case reproducing the
bug have already been reviewed by the GLib maintainers and merged to
both stable and development branches. I expect that at
least some GLib sources will start being called in a
different (but correct) order from now on, so keep an eye on your
GLib sources. :-)
Standing on the shoulders of giants
At this point I should acknowledge that without the support from my
colleagues in the WebKit team in Igalia, getting to the bottom of this
problem would have probably been much harder and perhaps my sanity
would have been at stake. I want to
thank Adrián
and &Zcaronan for
their input on Wayland, debugging techniques, and for allowing me to
bounce back and forth ideas and findings as I went deeper into this
rabbit hole, helping me to step out of dead-ends, reminding me to use
tools out of my everyday box, and ultimately, to be brave enough to
doubt GLib&aposs correctness, something that much more often than not I
take for granted.
Thanks also to Philip
and Sebastian for their
feedback and prompt code review!
It’s that time of year again: a new GNOME release, and with it, a new Epiphany. The pace of Epiphany development has increased significantly over the last few years thanks to an increase in the number of active contributors. Most notably, Jan-Michael Brummer has solved dozens of bugs and landed many new enhancements, Alexander Mikhaylenko has polished numerous rough edges throughout the browser, and Andrei Lisita has landed several significant improvements to various Epiphany dialogs. That doesn’t count the work that Igalia is doing to maintain WebKitGTK, the WPE graphics stack, and libsoup, all of which is essential to delivering quality Epiphany releases, nor the work of the GNOME localization teams to translate it to your native language. Even if Epiphany itself is only the topmost layer of this technology stack, having more developers working on Epiphany itself allows us to deliver increased polish throughout the user interface layer, and I’m pretty happy with the result. Let’s take a look at what’s new.
Intelligent Tracking Prevention
Intelligent Tracking Prevention (ITP) is the headline feature of this release. Safari has had ITP for several years now, so if you’re familiar with how ITP works to prevent cross-site tracking on macOS or iOS, then you already know what to expect here. If you’re more familiar with Firefox’s Enhanced Tracking Protection, or Chrome’s nothing (crickets: chirp, chirp!), then WebKit’s ITP is a little different from what you’re used to. ITP relies on heuristics that apply the same to all domains, so there are no blocklists of naughty domains that should be targeted for content blocking like you see in Firefox. Instead, a set of innovative restrictions is applied globally to all web content, and a separate set of stricter restrictions is applied to domains classified as “prevalent” based on your browsing history. Domains are classified as prevalent if ITP decides the domain is capable of tracking your browsing across the web, or non-prevalent otherwise. (The public-friendly terminology for this is “Classification as Having Cross-Site Tracking Capabilities,” but that is a mouthful, so I’ll stick with “prevalent.” It makes sense: domains that are common across many websites can track you across many websites, and domains that are not common cannot.)
ITP is enabled by default in Epiphany 3.38, as it has been for several years now in Safari, because otherwise only a small minority of users would turn it on. ITP protections are designed to be effective without breaking too many websites, so it’s fairly safe to enable by default. (You may encounter a few broken websites that have not been updated to use the Storage Access API to store third-party cookies. If so, you can choose to turn off ITP in the preferences dialog.)
For a detailed discussion covering ITP’s tracking mitigations, see Tracking Prevention in WebKit. I’m not an expert myself, but the short version is this: full third-party cookie blocking across all websites (to store a third-party cookie, websites must use the Storage Access API to prompt the user for permission); cookie-blocking latch mode (“once a request is blocked from using cookies, all redirects of that request are also blocked from using cookies”); downgraded third-party referrers (“all third-party referrers are downgraded to their origins by default”) to avoid exposing the path component of the URL in the referrer; blocked third-party HSTS (“HSTS […] can only be set by the first-party website […]”) to stop abuse by tracker scripts; detection of cross-site tracking via link decoration and 24-hour expiration time for all cookies created by JavaScript on the landing page when detected; a 7-day expiration time for all other cookies created by JavaScript (yes, this applies to first-party cookies); and a 7-day extendable lifetime for all other script-writable storage, extended whenever the user interacts with the website (necessary because tracking companies began using first-party scripts to evade the above restrictions). Additionally, for prevalent domains only, domains engaging in bounce tracking may have cookies forced to SameSite=strict, and Verified Partitioned Cache is enabled (cached resources are re-downloaded after seven days and deleted if they fail certain privacy tests). Whew!
WebKit has many additional privacy protections not tied to the ITP setting and therefore not discussed here — did you know that cached resources are partioned based on the first-party domain? — and there’s more that’s not very well documented which I don’t understand and haven’t mentioned (tracker collusion!), but that should give you the general idea of how sophisticated this is relative to, say, Chrome (chirp!). Thanks to John Wilander from Apple for his work developing and maintaining ITP, and to Carlos Garcia for getting it working on Linux. If you’re interested in the full history of how ITP has evolved over the years to respond to the changing threat landscape (e.g. tracking prevention tracking), see John’s WebKit blog posts. You might also be interested in WebKit’s Tracking Prevention Policy, which I believe is the strictest anti-tracking stance of any major web engine. TL;DR: “we treat circumvention of shipping anti-tracking measures with the same seriousness as exploitation of security vulnerabilities. If a party attempts to circumvent our tracking prevention methods, we may add additional restrictions without prior notice.” No exceptions.
Updated Website Data Preferences
As part of the work on ITP, you’ll notice that Epiphany’s cookie storage preferences have changed a bit. Since ITP enforces full third-party cookie blocking, it no longer makes sense to have a separate cookie storage preference for that, so I replaced the old tri-state cookie storage setting (always accept cookies, block third-party cookies, block all cookies) with two switches: one to toggle ITP, and one to toggle all website data storage.
Previously, it was only possible to block cookies, but this new setting will additionally block localStorage and IndexedDB, web features that allow websites to store arbitrary data in your browser, similar to cookies. It doesn’t really make much sense to block cookies but allow other types of data storage, so the new preferences should better enforce the user’s intent behind disabling cookies. (This preference does not yet block media keys, service workers, or legacy offline web application cache, but it probably should.) I don’t really recommend disabling website data storage, since it will cause compatibility issues on many websites, but this option is there for those who want it. Disabling ITP is also not something I want to recommend, but it might be necessary to access certain broken websites that have not yet been updated to use the Storage Access API.
Accordingly, Andrei has removed the old cookies dialog and moved cookie management into the Clear Personal Data dialog, which is a better place because anyone clearing cookies for a particular website is likely to also want to clear other personal data. (If you want to delete a website’s cookies, then you probably don’t want to leave its SQL databases intact, right?) He had to remove the ability to clear data from a particular point in time, because WebKit doesn’t support this operation for cookies, but that function is probably rarely-used and I think the benefit of the change should outweigh the cost. (We could bring it back in the future if somebody wants to try implementing that feature in WebKit, but I suspect not many users will notice.) Treating cookies as separate and different from other forms of website data storage no longer makes sense in 2020, and it’s good to have finally moved on from that antiquated practice.
New HTML Theme
Carlos Garcia has added a new Adwaita-based HTML theme to WebKitGTK 2.30, and removed support for rendering HTML elements using the GTK theme (except for scrollbars). Trying to use the GTK theme to render web content was fragile and caused many web compatibility problems that nobody ever managed to solve. The GTK developers were never very fond of us doing this in the first place, and the foreign drawing API required to do so has been removed from GTK 4, so this was also good preparation for getting WebKitGTK ready for GTK 4. Carlos’s new theme is similar to Adwaita, but gradients have been toned down or removed in order to give a flatter, neutral look that should blend in nicely with all pages while still feeling modern.
This should be a fairly minor style change for Adwaita users, but a very large change for anyone using custom themes. I don’t expect everyone will be happy, but please trust that this will at least result in better web compatibility and fewer tricky theme-related bug reports.
Left: Adwaita GTK theme controls rendered by WebKitGTK 2.28. Right: hardcoded Adwaita-based HTML theme with toned down gradients.
Although scrollbars will still use the GTK theme as of WebKitGTK 2.30, that will no longer be possible to do in GTK 4, so themed scrollbars are almost certain to be removed in the future. That will be a noticeable disappointment in every app that uses WebKitGTK, but I don’t see any likely solution to this.
Media Permissions
Jan-Michael added new API in WebKitGTK 2.30 to allow muting individual browser tabs, and hooked it up in Epiphany. This is good when you want to silence just one annoying tab without silencing everything.
Meanwhile, Charlie Turner added WebKitGTK API for managing autoplay policies. Videos with sound are now blocked from autoplaying by default, while videos with no sound are still allowed. Charlie hooked this up to Epiphany’s existing permission manager popover, so you can change the behavior for websites you care about without affecting other websites.
Configure your preferred media autoplay policy for a website near you today!
Improved Dialogs
In addition to his work on the Clear Data dialog, Andrei has also implemented many improvements and squashed bugs throughout each view of the preferences dialog, the passwords dialog, and the history dialog, and refactored the code to be much more maintainable. Head over to his blog to learn more about his accomplishments. (Thanks to Google for sponsoring Andrei’s work via Google Summer of Code, and to Alexander for help mentoring.)
Additionally, Adrien Plazas has ported the preferences dialog to use HdyPreferencesWindow, bringing a pretty major design change to the view switcher:
Left: Epiphany 3.36 preferences dialog. Right: Epiphany 3.38. Note the download settings are present in the left screenshot but missing from the right screenshot because the right window is using flatpak, and the download settings are unavailable in flatpak.
User Scripts
User scripts (like Greasemonkey) allow you to run custom JavaScript on websites. WebKit has long offered user script functionality alongside user CSS, but previous versions of Epiphany only exposed user CSS. Jan-Michael has added the ability to configure a user script as well. To enable, visit the Appearance tab in the preferences dialog (a somewhat odd place, but it really needs to be located next to user CSS due to the tight relationship there). Besides allowing you to do, well, basically anything, this also significantly enhances the usability of user CSS, since now you can apply certain styles only to particular websites. The UI is a little primitive — your script (like your CSS) has to be one file that will be run on every website, so don’t try to design a complex codebase using your user script — but you can use conditional statements to limit execution to specific websites as you please, so it should work fairly well for anyone who has need of it. I fully expect 99.9% of users will never touch user scripts or user styles, but it’s nice for power users to have these features available if needed.
HTTP Authentication Password Storage
Jan-Michael and Carlos Garcia have worked to ensure HTTP authentication passwords are now stored in Epiphany’s password manager rather than by WebKit, so they can now be viewed and deleted from Epiphany, which required some new WebKitGTK API to do properly. Unfortunately, WebKitGTK saves network passwords using the default network secret schema, meaning its passwords (saved by older versions of Epiphany) are all leaked: we have no way to know which application owns those passwords, so we don’t have any way to know which passwords were stored by WebKit and which can be safely managed by Epiphany going forward. Accordingly, all previously-stored HTTP authentication passwords are no longer accessible; you’ll have to use seahorse to look them up manually if you need to recover them. HTTP authentication is not very commonly-used nowadays except for internal corporate domains, so hopefully this one-time migration snafu will not be a major inconvenience to most users.
New Tab Animation
Jan-Michael has added a new animation when you open a new tab. If the newly-created tab is not visible in the tab bar, then the right arrow will flash to indicate success, letting you know that you actually managed to open the page. Opening tabs out of view happens too often currently, but at least it’s a nice improvement over not knowing whether you actually managed to open the tab or not. This will be improved further next year, because Alexander is working on a completely new tab widget to replace GtkNotebook.
New View Source Theme
Jim Mason changed view source mode to use a highlight.js theme designed to mimic Firefox’s syntax highlighting, and added dark mode support.
I added a new WebKitGTK 2.30 API to expose the paste as plaintext editor command, which was previously internal but fully-functional. I’ve hooked it up in Epiphany’s context menu as “Paste Text Only.” This is nice when you want to discard markup when pasting into a rich text editor (such as the WordPress editor I’m using to write this post).
Jan-Michael has implemented support for reordering pinned tabs. You can now drag to reorder pinned tabs any way you please, subject to the constraint that all pinned tabs stay left of all unpinned tabs.
Jan-Michael added a new import/export menu, and the bookmarks import/export features have moved there. He also added a new feature to import passwords from Chrome. Meanwhile, ignapk added support for importing bookmarks from HTML (compatible with Firefox).
Jan-Michael added a new preference to web apps to allow running them in the background. When enabled, closing the window will only hide the the window: everything will continue running. This is useful for mail apps, music players, and similar applications.
Continuing Jan-Michael’s list of accomplishments, he removed Epiphany’s previous hidden setting to set a mobile user agent header after discovering that it did not work properly, and replaced it by adding support in WebKitGTK 2.30 for automatically setting a mobile user agent header depending on the chassis type detected by logind. This results in a major user experience improvement when using Epiphany as a mobile browser. Beware: this functionality currently does not work in flatpak because it requires the creation of a new desktop portal.
Stephan Verbücheln has landed multiple fixes to improve display of favicons on hidpi displays.
Zach Harbort fixed a rounding error that caused the zoom level to display oddly when changing zoom levels.
Vanadiae landed some improvements to the search engine configuration dialog (with more to come) and helped investigate a crash that occurs when using the “Set as Wallpaper” function under Flatpak. The crash is pretty tricky, so we wound up disabling that function under Flatpak for now. He also updated screenshots throughout the user help.
Sabri Ünal continued his effort to document and standardize keyboard shortcuts throughout GNOME, adding a few missing shortcuts to the keyboard shortcuts dialog.
Encrypted Media Extensions (a.k.a. EME) is the W3C standard for encrypted media in the web. This way, media providers such as Hulu, Netflix, HBO, Disney+, Prime Video, etc. can provide their contents with a reasonable amount of confidence that it will make it very complicated for people to “save” their assets without their permission. Why do I use the word “serious” in the title? In WebKit there is already support for Clear Key, which is the W3C EME reference implementation but EME supports more encryption systems, even privative ones (I have my opinion about this, you can ask me privately). No service provider (that I know) supports Clear Key, they usually rely on Widevine, PlayReady or some other.
Three years ago, my colleague Žan Doberšek finished the implementation of what was going to be the shell of WebKit’s modern EME implementation, following latest W3C proposal. We implemented that downstream (at Web Platform for Embedded) as well using Thunder, which includes as a plugin a fork of what was Open Content Decryption Module (a.k.a. OpenCDM). The OpenCDM API changed quite a lot during this journey. It works well and there are millions of set-top-boxes using it currently.
The delta between downstream and the upstream GStreamer based WebKit ports was quite big, testing was difficult and syncing was not always easy, so we decided reverse the situation.
Our first step was done by my colleague Charlie Turner, who made Clear Key work upstream again while adapted some changes the Apple folks had done meanwhile. It was amazing to see Clear Key tests passing again and his work with the CDMProxy related classes was awesome. After having ClearKey working, I had to adapt them a bit to accomodate Thunder. To explain a bit about the WebKit EME architecture, I must say that there are two layers. The first is the crossplatform one, which implements the W3C API (MediaKeys, MediaKeySession, CDM…). These classes rely on the platform ones (CDMPrivate, CDMInstance, CDMInstanceSession) to handle the platform management, message exchange, etc. which would be the second layer. Apple playback system is fully integrated with their DRM system so they don’t need anything else. We do because we need to integrate our own decryptors to defer to Thunder for decryption so in the GStreamer based ports we also need the CDMProxy related classes, which would be CDMProxy, CDMInstanceProxy, CDMInstanceSessionProxy… The last two extend CDMInstance and CDMInstanceSession respectively to be able to deal with the key management, that is abstracted to the KeyHandle and KeyStore.
Once the abstraction is there (let’s remember that the abstranction works both for Clear Key and Thunder), the Thunder implementation is quite simple, justgluing the CDMProxy, CDMInstanceProxy and CDMInstanceSessionProxy classes to the Thunder system and writing a GStreamer decryptor element for it. I might have made a mistake when selecting the files but considering Thunder classes + the GStreamer common decryptor code, cloc says it is just 1198 lines of platform code. I think it is pretty low for what it does. Apart from that, obviously, there are 5760 lines of crossplatform code.
To build and run all this you need to do several things:
Build the dependencies with WEBKIT_JHBUILD=1 JHBUILD_ENABLE_THUNDER="yes" to enable the old fashioned JHBuild build and force it to build the Thunder dependencies. All dependendies are on JHBuild, even Widevine is referenced but to download it you need the proper credentials as it is closed source.
Pass --thunder when calling build-webkit.sh.
Run MiniBrowser with WEBKIT_GST_EME_RANK_PRIORITY="Thunder" and pass parameters --enable-mediasource=TRUE --enable-encrypted-media=TRUE --autoplay-policy=allow. The autoplay policy is usually optional but in this case it is necessary for the YouTube TV tests. We need to give the Thunder decryptor a higher priority because of WebM, that does not specify a key system and without it the Clear Key one can be selected and fail. MP4 does not create trouble because the protection system is specified and the caps negotiation does its magic.
As you could have guessed if you have a closer look at the GStreamer JHBuild moduleset, you’ll see that only Widevine is supported. To support more, you only have to make them build in the Thunder ecosystem and add them to CDMFactoryThunder::supportedKeySystems.
When I coded this, all YouTube TV tests for Widevine were green in the desktop. At the moment of writing this post they aren’t because of some problem with the Widevine installation that will be sorted quickly, I hope.
Graphics overlays are everywhere nowadays in the live video broadcasting
industry. In this post I introduce a new demo relying on GStreamer and
WPEWebKit to deliver low-latency web-augmented video broadcasts.
Readers of this blog might remember a few posts about WPEWebKit and a
GStreamer element we at Igalia worked on …
After the latest migration of WebKitGTK test bots to use the new SDK based on Flatpak, the old development environment based on jhbuild became deprecated. It can still be used with export WEBKIT_JHBUILD=1, though, but support for this way of working will gradually fade out.
My mail goal was to have a comfortable IDE that follows standard GUI conventions (that is, no emacs nor vim) and has code indexing features that (more or less) work with the WebKit codebase. Qt Creator was providing all that to me in the old chroot environment thanks to some configuration tricks by Alicia, so it should be good for the new one.
The WebKit source code can be downloaded as always using git:
git clone git.webkit.org/WebKit.git
It’s useful to add WebKit/Tools/Scripts and WebKit/Tools/gtk to your PATH, as well as any other custom tools you may have. You can customize your $HOME/.bashrc for that, but I prefer to have an env.sh environment script to be sourced from the current shell when I want to enter into my development environment (by running webkit). If you’re going to use it too, remember to adjust to your needs the paths used there.
Even if you have a pretty recent distro, it’s still interesting to have the latests Flatpak tools. Add Alex Larsson’s PPA to your apt sources:
sudo add-apt-repository ppa:alexlarsson/flatpak
In order to ensure that your distro has all the packages that webkit requires and to install the WebKit SDK, you have to run these commands (I omit the full path). Downloading the Flatpak modules will take a while, but at least you won’t need to build everything from scratch. You will need to do this again from time to time, every time the WebKit base dependencies change:
install-dependencies update-webkitgtk-libs
Now just build WebKit and check that MiniBrowser works:
This build process should have generated a WebKit/WebKitBuild/GTK/Release/compile_commands.json file with the right parameters and paths used to build each compilation unit in the project. This file can be leveraged by Qt Creator to get the right include paths and build flags after some preprocessing to translate the paths that make sense from inside Flatpak to paths that make sense from the perspective of your main distro. I wrote compile_commands.sh to take care of those transformations. It can be run manually or automatically when calling go full-rebuild or go update.
With all the needed pieces in place, it’s time to import the project into Qt Creator. To do that, click File → Open File or Project, and then select the compile_commands.json file that compile_commands.sh should have generated in the WebKit main directory.
Now make sure that Qt Creator has the right plugins enabled in Help → About Plugins…. Specifically: GenericProjectManager, ClangCodeModel, ClassView, CppEditor, CppTools, ClangTools, TextEditor and LanguageClient (more on that later).
With this setup, after a brief initial indexing time, you will have support for features like Switch header/source (F4), Follow symbol under cursor (F2), shading of disabled if-endif blocks, auto variable type resolving and code outline. There are some oddities of compile_commands.json based projects, though. There are no compilation units in that file for header files, so indexing features for them only work sometimes. For instance, you can switch from a method implementation in the cpp file to its declaration in the header file, but not the opposite. Also, you won’t see all the source files under the Projects view, only the compilation units, which are often just a bunch of UnifiedSource-*.cpp files. That’s why I prefer to use the File System view.
Additional features like Open Type Hierarchy (Ctrl+Shift+T) and Find References to Symbol Under Cursor (Ctrl+Shift+U) are only available when a Language Client for Language Server Protocol is configured. Fortunately, the new WebKit SDK comes with the ccls C/C++/Objective-C language server included. To configure it, open Tools → Options… → Language Client and add a new item with the following properties:
Some “LanguageClient ccls: Unexpectedly finished. Restarting in 5 seconds.” errors will appear in the General Messages panel after configuring the language client and every time you launch Qt Creator. It’s just ccls taking its time to index the whole source code. It’s “normal”, don’t worry about it. Things will get stable and start to work after some minutes.
Due to the way the Locator file indexer works in Qt Creator, it can become confused, run out of memory and die if it finds cycles in the project file tree. This is common when using Flatpak and running the MiniBrowser or the tests, since /proc and other large filesystems are accessible from inside WebKit/WebKitBuild. To avoid that, open Tools → Options… → Environment → Locator and set Refresh interval to 0 min.
I also prefer to call my own custom build and run scripts (go and runtest.sh) instead of letting Qt Creator build the project with the default builders and mess everything. To do that, from the Projects mode (Ctrl+5), click on Build & Run → Desktop → Build and edit the build configuration to be like this:
Build directory: /home/enrique/work/webkit/WebKit
Add build step → Custom process step
Command: go (no absolute route because I have it in my PATH)
Arguments:
Working directory: /home/enrique/work/webkit/WebKit
Then, for Build & Run → Desktop → Run, use these options:
Deployment: No deploy steps
Run:
Run configuration: Custom Executable → Add
Executable: runtest.sh
Command line arguments:
Working directory:
With these configuration you can build the project with Ctrl+B and run it with Ctrl+R.
I think I’m not forgetting anything more regarding environment setup. With the instructions in this post you can end up with a pretty complete IDE. Here’s a screenshot of it working in its full glory:
Anyway, to be honest, nothing will ever reach the level of code indexing features I got with Eclipse some years ago. I could find usages of a variable/attribute and know where it was being read, written or read-written. Unfortunately, that environment stopped working for me long ago, so Qt Creator has been the best I’ve managed to get for a while.
Properly configured web based indexers such as the Searchfox instance configured in Igalia can also be useful alternatives to a local setup, although they lack features such as type hierarchy.
I hope you’ve found this post useful in case you try to setup an environment similar to the one described here. Enjoy!
Last week I attended the Web Engines Hackfest. The
event was sponsored by Igalia (also hosting the event), Adobe and Collabora.
As usual I spent most of the time working on the WebKitGTK+ GStreamer
backend and Sebastian Dröge kindly joined and helped out quite a
bit, make sure to read …
Using videos in the <img> HTML tag can lead to more responsive web-page loads
in most cases. Colin Bendell blogged about this topic, make sure to read his
post on the cloudinary website. As it turns out, this feature has been
supported for more than 2 years in Safari, but …
Working on a web-engine often requires a complex build infrastructure. This post
documents our transition from JHBuild to Flatpak for the WebKitGTK and
WPEWebKit development builds.
For the last 10 years, WebKitGTK has been relying on a custom JHBuild
moduleset to handle its dependencies and (try to) ensure a reproducible …
When you connect to a Wi-Fi network, that network might block your access to the wider internet until you’ve signed into the network’s captive portal page. An untrusted network can disrupt your connection at any time by blocking secure requests and replacing the content of insecure requests with its login page. (Of course this can be done on wired networks as well, but in practice it mainly happens on Wi-Fi.) To detect a captive portal, NetworkManager sends a request to a special test address (e.g. http://fedoraproject.org/static/hotspot.txt) and checks to see whether it the content has been replaced. If so, GNOME Shell will open a little WebKitGTK browser window to display http://nmcheck.gnome.org, which, due to the captive portal, will be hijacked by your hotel or airport or whatever to display the portal login page. Rephrased in security lingo: an untrusted network may cause GNOME Shell to load arbitrary web content whenever it wants. If that doesn’t immediately sound dangerous to you, let’s ask me from four years ago why that might be bad:
Web engines are full of security vulnerabilities, like buffer overflows and use-after-frees. The details don’t matter; what’s important is that skilled attackers can turn these vulnerabilities into exploits, using carefully-crafted HTML to gain total control of your user account on your computer (or your phone). They can then install malware, read all the files in your home directory, use your computer in a botnet to attack websites, and do basically whatever they want with it.
If the web engine is sandboxed, then a second type of attack, called a sandbox escape, is needed. This makes it dramatically more difficult to exploit vulnerabilities.
The captive portal helper will pop up and load arbitrary web content without user interaction, so there’s nothing you as a user could possibly do about it. This makes it a tempting target for attackers, so we want to ensure that users are safe in the absence of a sandbox escape. Accordingly, beginning with GNOME 3.36, the captive portal helper is now sandboxed.
How did we do it? With basically one line of code (plus a check to ensure the WebKitGTK version is new enough). To sandbox any WebKitGTK app, just call webkit_web_context_set_sandbox_enabled(). Ta-da, your application is now magically secure!
No, really, that’s all you need to do. So if it’s that simple, why isn’t the sandbox enabled by default? It can break applications that use WebKitWebExtension to run custom code in the sandboxed web process, so you’ll need to test to ensure that your application still works properly after enabling the sandbox. (The WebKitGTK sandbox will become mandatory in the future when porting applications to GTK 4. That’s thinking far ahead, though, because GTK 4 isn’t supported yet at all.) You may need to use webkit_web_context_add_path_to_sandbox() to give your web extension access to directories that would otherwise be blocked by the sandbox.
The sandbox is critically important for web browsers and email clients, which are constantly displaying untrusted web content. But really, every app should enable it. Fix your apps! Then thank Patrick Griffis from Igalia for developing WebKitGTK’s sandbox, and the bubblewrap, Flatpak, and xdg-desktop-portal developers for providing the groundwork that makes it all possible.
Once upon a time, beginning with GNOME 3.14, Epiphany had supported displaying PDF documents via the Evince NPAPI browser plugin developed by Carlos Garcia Campos. Unfortunately, because NPAPI plugins have to use X11-specific APIs to draw web content, this didn’t suffice for very long. When GNOME switched to Wayland by default in GNOME 3.24 (yes, that was three years ago!), this functionality was left behind. Using an NPAPI plugin also meant the code was inherently unsandboxable and tied to a deprecated technology. Epiphany disabled support for NPAPI plugins by default in Epiphany 3.30, hiding the functionality behind a hidden setting, which has now finally been removed for Epiphany 3.36, killing off NPAPI for good.
Jan-Michael Brummer, who comaintains Epiphany with me, tried bringing back PDF support for Epiphany 3.34 using libevince, but eventually we decided to give up on this approach due to difficulty solving some user experience issues. Also, the rendering occurred in the unsandboxed UI process, which was again not good for security.
But PDF support is now back in Epiphany 3.36, and much better than before! Thanks to Jan-Michael, Epiphany now supports displaying PDFs using the amazing PDF.js. We are thankful for Mozilla’s work in developing PDF.js and open sourcing it for us to use. Viewing PDFs in Epiphany using PDF.js is more convenient than downloading them and opening them in Evince, and because the PDF is rendered in the sandboxed web process, using web technologies rather than poppler, it’s also approximately one bazillion times more secure.
Look, it’s a PDF!
One limitation of PDF.js is that it does not support forms. If you need to fill out PDF forms, you’ll need to download the PDF and open it in Evince, just as you would if using Firefox.
Dark Mode
Thanks to Carlos Garcia, it should finally be possible to use Epiphany with dark GTK themes. WebKitGTK has historically rendered HTML elements using the GTK theme, which has not been good for users of dark themes, which broke badly on many websites, usually due to dark text being drawn on dark backgrounds or various other problems with unexpected dark widgets. Since WebKitGTK 2.28, WebKit will try to manually change to a light GTK theme when it thinks a dark theme is in use, then use the light theme to render web content. (This work has actually been backported to WebKitGTK 2.26.4, so you don’t need to upgrade to WebKitGTK 2.28 to benefit, but the work landed very recently and we haven’t blogged about it yet.) Thanks to Cassidy James from elementary for providing example pages for testing dark mode behavior.
Broken dark mode support prior to WebKitGTK 2.26.4. Notice that the first two pages use dark color schemes when light color schemes are expected, and the dark blue links are hard to read over the dark gray background. Also notice that the text in the second image is unreadable.
Since WebKitGTK 2.26.4, dark mode works as it does in most other browsers. Websites that don’t support dark mode are light, and websites that do support dark mode are dark. Widgets themed using GTK are always light.
Since Carlos had already added support for the prefers-color-scheme media query last year, this now gets us up to dark mode parity with most browsers, except, notably, Safari. Unlike other browsers, Safari allows websites to opt-in to rendering dark system widgets, like WebKitGTK used to do before these changes. Whether to support this in WebKitGTK remains to-be-determined.
Process Swap on Navigation (PSON)
PSON, which debuted in Safari 13, is a major change in WebKit’s process model. PSON is the first component of site isolation, which Chrome has supported for some time, and which Firefox is currently working towards. If you care about web security, you should care a lot about site isolation, because the web browser community has arrived at a consensus that this is the best way to mitigate speculative execution attacks.
Nowadays, all modern web browsers use separate, sandboxed helper processes to render web content, ensuring that the main user interface process, which is unsandboxed, does not touch untrusted web content. Prior to 3.36, Epiphany already used a separate web process to display each browser tab (except for “related views,” where one tab opens another and gains scripting ability over the opened tab, subject to the Same Origin Policy). But in Epiphany 3.36, we now also have a separate web process per website. Each tab will swap between different web processes when navigating between different websites, to prevent any one web process from loading content from different websites.
To make these process swap navigations fast, a pool of prewarmed processes is used to hide the startup cost of launching a new process by ensuring the new process exists before it’s needed; otherwise, the overhead of launching a new web process to perform the navigation would become noticeable. And suspended processes live on after they’re no longer in use because they may be needed for back/forward navigations, which use WebKit’s page cache when possible. (In the page cache, pages are kept in memory indefinitely, to make back/forward navigations fast.)
Due to internal refactoring, PSON previously necessitated some API breakage in WebKitGTK 2.26 that affected Evolution and Geary: WebKitGTK 2.26 deprecated WebKit’s single web process model and required that all applications use one web process per web view, which Evolution and Geary were not, at the time, prepared to handle. We tried hard to avoid this, because we hate to make behavioral changes that break applications, but in this case we decided it was unavoidable. That was the status quo in 2.26, without PSON, which we disabled just before releasing 2.26 in order to limit application breakage to just Evolution and Geary. Now, in WebKitGTK 2.28, PSON is finally available for applications to use on an opt-in basis. (It will become mandatory in the future, for GTK 4 applications.) Epiphany 3.36 opts in. To make this work, Carlos Garcia designed new WebKitGTK APIs for cross-process communication, and used them to replace the private D-Bus server that Epiphany previously used for this purpose.
WebKit still has a long way to go to fully implement site isolation, but PSON is a major step down that road. Thanks to Brady Eidson and Chris Dumez from Apple for making this work, and to Carlos Garcia for handling most of the breakage (there was a lot). As with any major intrusive change of such magnitude, regressions are inevitable, so don’t hesitate to report issues on WebKit Bugzilla.
highlight.js
Once upon a time, WebKit had its own implementation for viewing page source, but this was removed from WebKit way back in 2014, in WebKitGTK 2.6. Ever since, Epiphany would open your default text editor, usually gedit, to display page source. Suffice to say that this was not a very satisfactory solution.
I finally managed to implement view source mode at the Epiphany level for Epiphany 3.30, but I had trouble making syntax highlighting work. I tried using various open source syntax highlighting libraries, but most are designed to highlight small amounts of code, not large web pages. The libraries I tried were not fast enough, so I gave up on syntax highlighting at the time.
Thanks to Jan-Michael, Epiphany 3.36 supports syntax highlighting using highlight.js, so we finally have view source mode working fully properly once again. It works much better than my failed attempts with different JS libraries. Please thank the highlight.js developers for maintaining this library, and for making it open source.
Colors!
Service Workers
Service workers are now available in WebKitGTK 2.28. Our friends at Apple had already implemented service worker support a couple years ago for Safari 11, but we were pretty slow in bringing this functionality to Linux. Finally, WebKitGTK should now be up to par with Safari in this regard.
Cookies!
Patrick Griffis has updated libsoup and WebKitGTK to support SameSite cookies. He’s also tightened up our cookie policy by implementing strict secure cookies, which prevents http:// pages from setting secure cookies (as they could overwrite secure cookies set by https:// pages).
Adaptive Design
As usual, there are more adaptive design improvements throughout the browser, to provide a better user experience on the Librem 5. There’s still more work to be done here, but Epiphany continues to provide the best user experience of any Linux browser at small screen sizes. Thanks to Adrien Plazas and Jan-Michael for their continued work on this.
As before, simply resize your browser window to see Epiphany dynamically transition between desktop mode and mobile mode.
elementary OS
With help from Alexander Mikhaylenko, we’ve also upstreamed many elementary OS design changes, which will be used when running under the Pantheon desktop (and not impact users on other desktops), so that the elementary developers don’t need to maintain their customizations as separate patches anymore. This will eliminate a few elementary-specific bugs, including some keyboard shortcuts that were previously broken only in elementary, and some odd tab bar behavior. Although Epiphany still doesn’t feel quite as native as an app designed just for elementary OS, it’s getting closer.
Epiphany 3.34
I failed to blog about Epiphany 3.34 when I released it last September. Hopefully you have updated to 3.34 already, and are already enjoying the two big features from this release: the new adblocker, and the bubblewrap sandbox.
The new adblocker is based on WebKit Content Blockers, which was developed by Apple several years ago. Adrian Perez developed new WebKitGTK API to expose this functionality, changed Epiphany to use it, and deleted Epiphany’s older resource-hungry adblocker that was originally copied from Midori. Previously, Epiphany kept a large GHashMap of compiled regexes in every web process, consuming a very significant amount of RAM for each process. It also took time to compile these regexes when launching each new web process. Now, the adblock filters are instead compiled into an efficient bytecode format that gets mmapped between all web processes to avoid excessive resource use. The bytecode is interpreted by WebKit itself, rather than by Epiphany’s web process extension (which Epiphany uses to execute custom code in WebKit’s web process), for greatly improved performance.
Lastly, Epiphany 3.34 enabled Patrick’s bubblewrap sandbox, which was added in WebKitGTK 2.26. Bubblewrap is an amazing sandboxing tool, already used effectively by flatpak and rpm-ostree, and I’m very pleased with Patrick’s decision to use it for WebKit as well. Because enabling the sandbox can break applications, it is currently opt-in for GTK 3 apps (but will become mandatory for GTK 4 apps). If your application uses WebKitGTK, you really need to take some time to enable this sandbox using webkit_web_context_set_sandbox_enabled(). The sandbox has introduced a couple regressions that we didn’t notice until too late; notably, printing no longer works, which, half a year later, we still haven’t managed to fix yet. (I’ll try to get to it soon.)
OK, this concludes your 3.36 and 3.34 updates. Onward to 3.38!
Once again this year I attended the GStreamer conference and just before
that, Embedded Linux conference Europe which took place in Lyon (France).
Both events were a good opportunity to demo one of the use-cases I have in mind
for GstWPE, HTML overlays!
Epiphany Technology Preview has moved from https://sdk.gnome.org to https://nightly.gnome.org. The old Epiphany Technology Preview is now end-of-life. Action is required to update. If you installed Epiphany Technology Preview prior to a couple minutes ago, uninstall it using GNOME Software and then reinstall using this new flatpakref.
Apologies for this disruption.
The main benefit to end users is that you’ll no longer need separate remotes for nightly runtimes and nightly applications, because everything is now hosted in one repo. See Abderrahim’s announcement for full details on why this transition is occurring.
This has resulted in a public relations drama that is largely a distraction to the issue at hand. Whatever big-company PR departments have to say on the matter, I have no doubt that the developers working on WebKit recognize the severity of this incident and are grateful to Project Zero, which reported these vulnerabilities and has previously provided numerous other high-quality private vulnerability reports. (Many other organizations deserve credit for similar reports, especially Trend Micro’s Zero Day Initiative.)
WebKit as a project will need to reassess certain software development practices that may have facilitated the abuse of these vulnerabilities. The practice of committing security fixes to open source long in advance of corresponding Safari releases may need to be reconsidered.
Sadly, Uighurs should assume their personal computing devices have been compromised by state-sponsored attackers, and that their private communications are not private. Even if not compromised in this particular incident, similar successful attacks are overwhelmingly likely in the future.
I’m excited to announce that Epiphany Tech Preview has reached version 3.33.3-33, as computed by git describe. That is 33 commits after 3.33.3:
I’m afraid 3.33.4 will arrive long before we make it to 3.33.3-333, so this is probably the last cool version number Epiphany will ever have.
I might be guilty of using an empty commit to claim the -33 commit.
I might also apologize for wasting your time with a useless blog post, except this was rather fun. I await the controversy of your choice in the comments.
Last week I finally found some time to add the automation mode to Epiphany, that allows to run automated tests using WebDriver. It’s important to note that the automation mode is not expected to be used by users or applications to control the browser remotely, but only by WebDriver automated tests. For that reason, the automation mode is incompatible with a primary user profile. There are a few other things affected by the auotmation mode:
There’s no persistency. A private profile is created in tmp and only ephemeral web contexts are used.
URL entry is not editable, since users are not expected to interact with the browser.
An info bar is shown to notify the user that the browser is being controlled by automation.
The window decoration is orange to make it even clearer that the browser is running in automation mode.
So, how can I write tests to be run in Epiphany? First, you need to install a recently enough selenium. For now, only the python API is supported. Selenium doesn’t have an Epiphany driver, but the WebKitGTK driver can be used with any WebKitGTK+ based browser, by providing the browser information as part of session capabilities.
This is a very simple example that just opens Epiphany in automation mode, loads http://www.webkitgtk.org and closes Epiphany. A few comments about the example:
Version 3.31.4 will be the first one including the automation mode.
At the beginning of October I had the wonderful chance of attending the Web
Engines Hackfest in A Coruña, hosted by
Igalia. This year we were over 50 participants, which
was great to associate even more faces to IRC nick names, but more importantly
allows hackers working at all the levels of the Web stack to share a common
space for a few days, making it possible to discuss complex topics and figure
out the future of the projects which allow humanity to see pictures of cute
kittens — among many other things.
Enabling support for the CSS generic system font family.
Fun trivia: Most of the WebKit contributors work from
the United States, so the week of the Web Engines hackfest is probably the only
single moment during the whole year that there is a sizeable peak of activity
in European day times.
Watching repository activity during the hackfest.
Towards WPE Releases
At Igalia we are making an important investment in the WPE WebKit
port, which is specially targeted towards
embedded devices. An important milestone for the project was reached last May
when the code was moved to main WebKit
repository, and has been
receiving the usual stream of improvements and bug fixes. We are now
approaching the moment where we feel that is is ready to start making releases,
which is another major milestone.
Our plan for the WPE is to synchronize with
WebKitGTK+, and produce releases for both in
parallel. This is important because both ports share a good amount of their
code and base dependencies (GStreamer, GLib, libsoup) and our efforts to
stabilize the GTK+ port before each release will benefit the WPE one as well,
and vice versa. In the coming weeks we will be publishing the first official
tarball starting off the WebKitGTK+ 2.18.x stable
branch.
Wild WEBKIT PORT appeared!
Syncing the releases for both ports means that:
Both stable and unstable releases are done in sync with the GNOME
release schedule. Unstable
releases start at version X.Y.1, with Y being an odd number.
About one month before the release dates, we create a new release branch
and from there on we work on stabilizing the code. At least one testing
release with with version X.Y.90 will be made. This is also what GNOME
does, and we will mimic this to avoid confusion for downstream packagers.
The stable release will have version X.Y+1.0. Further maintenance
releases happen from the release branch as needed. At the same time,
a new cycle of unstable releases is started based on the code from the
tip of the repository.
Believe it or not, preparing a codebase for its first releases involves quite
a lot of work, and this is what took most of my coding time during the Web
Engines Hackfest and also the following weeks: from small
fixesfor build
failures all the way to making
sure that public API headers (only the correct
ones!) are
installedand
usable, that applications can
be properly linked, and that
release tarballs can actually be
created. Exhausting? Well, do
not forget that we need to set up a web server to host the tarballs, a small
website, and the documentation. The latter has to be generated (there is still
pending work in this regard), and the whole process of making a release
scripted.
Still with me? Great. Now for a plot twist: we won’t be making
proper releases just yet.
APIs, ABIs, and Releases
There is one topic which I did not touch yet: API/ABI stability. Having done
a release implies that the public API and ABI which are part of it are stable,
and they are not subject to change.
Right after upstreaming WPE we switched over from the cross-port WebKit2 C API
and added a new, GLib-based API to WPE. It is remarkably similar (if not the
same in many cases) to the API exposed by WebKitGTK+, and this makes us
confident that the new API is higher-level, more ergonomic, and better overall.
At the same time, we would like third party developers to give it a try (which
is easier having releases) while retaining the possibility of getting feedback
and improving the WPE GLib API before setting it on stone (which is not
possible after a release).
It is for this reason that at least during the first WPE release cycle we
will make preview releases, meaning that there might be API and ABI
changes from one release to the next. As usual we will not be making
breaking changes in between releases of the same stable series, i.e. code
written for 2.18.0 will continue to build unchanged with any subsequent
2.18.X release.
At any rate, we do not expect the API to receive big changes because —as
explained above— it mimics the one for WebKitGTK+, which has already proven
itself both powerful enough for complex applications and convenient to use for
the simpler ones. Due to this, I encourage developers to try out WPE as soon
as we have the first preview release fresh out of the oven.
Packaging for Buildroot
At Igalia we routinely work with embedded devices, and often we make use of
Buildroot for cross-compilation. Having actual
releases of WPE will allow us to contribute a set of build definitions for the
WPE WebKit port and its dependencies — something that I have already started
working on.
Lately I have been taking care of keeping the WebKitGTK+ packaging for
Buildroot up-to-date and it has been delightful to work with such a welcoming
community. I am looking forward to having WPE supported there, and to keep
maintaining the build definitions for both. This will allow making use of WPE
with relative ease, while ensuring that Buildroot users will pick our updates
promptly.
Generic System Font
Some applications like GNOME Web
Epiphany use a WebKitWebView to display
widget-like controls which try to follow the design of the rest of the desktop.
Unfortunately for GNOME applications this means
Cantarell gets hardcoded
in the style sheet —it is the default font after all— and this results in
mismatched fonts when the user has chosen a different font for the interface
(e.g. in Tweaks). You can see
this in the following screen capture of Epiphany:
Web using hardcoded Cantarell and (on hover) `-webkit-system-font`.
Here I have configured the beautiful Inter UI font as
the default for the desktop user interface. Now, if you roll your mouse over
the image, you will see how much better it looks to use a consistent font.
This change also affects the list of plugins and applications, error messages,
and in general all the about: pages.
If you are running GNOME 3.26, this is already
fixed using font: menu
(part of the CSS
spec
since ye olde CSS 2.1) — but we can do better: Safari has had support since
2015,
for a generic “system” font family, similar to sans-serif or cursive:
/* Using the new generic font family (nice!). */body {
font-family: -webkit-system-font;
}
/* Using CSS 2.1 font shorthands (not so nice). */body {
font: menu; /* Pick ALL font attributes... */font-size: 12pt; /* ...then reset some of them. */font-weight: 400;
}
Web Inspector using Cantarell, the default GNOME 3 font
([full size](minibrowser-inspector-cantarell.png)).
I am convinced that users do notice and appreciate attention to detail,
even if they do unconsciously, and therefore it is worthwhile to work on this
kind of improvements.
Plus, as a design enthusiast with a slight case of typographic
OCD, I cannot stop myself
from noticing inconsistent usage of fonts and my mind is now at ease knowing
that opening the Web Inspector won’t be such a jarring experience anymore.
Outro
But there’s one more thing: On occasion we developers have to debug situations
in which a process is seemingly stuck. One useful technique involves running
the offending process under the control of a debugger (or, in an embedded
device, under gdbserver and controlled remotely), interrupting its execution
at intervals, and printing stack traces to try and figure out what is going on.
Unfortunately, in some circumstances running a debugger can be difficult or
impractical. Wouldn’t it be grand if it was possible to interrupt the process
without needing a debugger and request a stack trace? Enter “Out-Of-Band
Stack Traces” (proof of
concept):
The process installs a signal handler using
sigaction(7), with the
SA_SIGINFO flag set.
On reception of the signal, the kernel interrupts the process (even if it’s
in an infinite loop), and invokes the signal handler passing an extra
pointer to an ucontext_t value, which contains a snapshot of the execution
status of the thread which was in the CPU before the signal handler was
invoked. This is true for many platform including Linux and most BSDs.
The signal handler code can get obtain the instruction and stack pointers
from the ucontext_t value, and walk the stack to produce a stack trace
of the code that was being executed. Jackpot! This is of course
architecture dependent but not difficult to get right (and well tested)
for the most common ones like x86 and ARM.
The nice thing about this approach is that the code that obtains the stack
trace is built into the program (no rebuilds needed), and it does not even
require to relaunch the process in a debugger — which can be crucial for
analyzing situations which are hard to reproduce, or which do not happen
when running inside a debugger. I am looking forward to have some time to
integrate this properly into WebKitGTK+ and specially WPE, because it will
be most useful in embedded devices.
WebDriver is an automation API to control a web browser. It allows to create automated tests for web applications independently of the browser and platform. WebKitGTK+ 2.18, that will be released next week, includes an initial implementation of the WebDriver specification.
WebDriver in WebKitGTK+
There’s a new process (WebKitWebDriver) that works as the server, processing the clients requests to spawn and control the web browser. The WebKitGTK+ driver is not tied to any specific browser, it can be used with any WebKitGTK+ based browser, but it uses MiniBrowser as the default. The driver uses the same remote controlling protocol used by the remote inspector to communicate and control the web browser instance. The implementation is not complete yet, but it’s enough for what many users need.
The clients
The web application tests are the clients of the WebDriver server. The Selenium project provides APIs for different languages (Java, Python, Ruby, etc.) to write the tests. Python is the only language supported by WebKitGTK+ for now. It’s not yet upstream, but we hope it will be integrated soon. In the meantime you can use our fork in github. Let’s see an example to understand how it works and what we can do.
from selenium import webdriver
# Create a WebKitGTK driver instance. It spawns WebKitWebDriver
# process automatically that will launch MiniBrowser.
wkgtk = webdriver.WebKitGTK()
# Let's load the WebKitGTK+ website.
wkgtk.get("https://www.webkitgtk.org")
# Find the GNOME link.
gnome = wkgtk.find_element_by_partial_link_text("GNOME")
# Click on the link.
gnome.click()
# Find the search form.
search = wkgtk.find_element_by_id("searchform")
# Find the first input element in the search form.
text_field = search.find_element_by_tag_name("input")
# Type epiphany in the search field and submit.
text_field.send_keys("epiphany")
text_field.submit()
# Let's count the links in the contents div to check we got results.
contents = wkgtk.find_element_by_class_name("content")
links = contents.find_elements_by_tag_name("a")
assert len(links) > 0
# Quit the driver. The session is closed so MiniBrowser
# will be closed and then WebKitWebDriver process finishes.
wkgtk.quit()
Note that this is just an example to show how to write a test and what kind of things you can do, there are better ways to achieve the same results, and it depends on the current source of public websites, so it might not work in the future.
Web browsers / applications
As I said before, WebKitWebDriver process supports any WebKitGTK+ based browser, but that doesn’t mean all browsers can automatically be controlled by automation (that would be scary). WebKitGTK+ 2.18 also provides new API for applications to support automation.
First of all the application has to explicitly enable automation using webkit_web_context_set_automation_allowed(). It’s important to know that the WebKitGTK+ API doesn’t allow to enable automation in several WebKitWebContexts at the same time. The driver will spawn the application when a new session is requested, so the application should enable automation at startup. It’s recommended that applications add a new command line option to enable automation, and only enable it when provided.
After launching the application the driver will request the browser to create a new automation session. The signal “automation-started” will be emitted in the context to notify the application that a new session has been created. If automation is not allowed in the context, the session won’t be created and the signal won’t be emitted either.
The WebKitAutomationSession will emit the signal “create-web-view” every time the driver needs to create a new web view. The application can then create a new window or tab containing the new web view that should be returned by the signal. This signal will always be emitted even if the browser has already an initial web view open, in that case it’s recommened to return the existing empty web view.
Web views are also automation aware, similar to ephemeral web views, web views that allow automation should be created with the constructor property “is-controlled-by-automation” enabled.
This is the new API that applications need to implement to support WebDriver, it’s designed to be as safe as possible, but there are many things that can’t be controlled by WebKitGTK+, so we have several recommendations for applications that want to support automation:
Add a way to enable automation in your application at startup, like a command line option, that is disabled by default. Never allow automation in a normal application instance.
Enabling automation is not the only thing the application should do, so add an automation mode to your application.
Add visual feedback when in automation mode, like changing the theme, the window title or whatever that makes clear that a window or instance of the application is controllable by automation.
Add a message to explain that the window is being controlled by automation and the user is not expected to use it.
Use ephemeral web views in automation mode.
Use a temporal user profile in application mode, do not allow automation to change the history, bookmarks, etc. of an existing user.
Do not load any homepage in automation mode, just keep an empty web view (about:blank) that can be used when a new web view is requested by automation.
The WebKitGTK client driver
Applications need to implement the new automation API to support WebDriver, but the WebKitWebDriver process doesn’t know how to launch the browsers. That information should be provided by the client using the WebKitGTKOptions object. The driver constructor can receive an instance of a WebKitGTKOptions object, with the browser information and other options. Let’s see how it works with an example to launch epiphany:
from selenium import webdriver
from selenium.webdriver import WebKitGTKOptions
options = WebKitGTKOptions()
options.browser_executable_path = "/usr/bin/epiphany"
options.add_browser_argument("--automation-mode")
epiphany = webdriver.WebKitGTK(browser_options=options)
Again, this is just an example, Epiphany doesn’t even support WebDriver yet. Browsers or applications could create their own drivers on top of the WebKitGTK one to make it more convenient to use.
from selenium import webdriver
epiphany = webdriver.Epiphany()
Plans
During the next release cycle, we plan to do the following tasks:
Complete the implementation: add support for all commands in the spec and complete the ones that are partially supported now.
Add support for running the WPT WebDriver tests in the WebKit bots.
Add a WebKitGTK driver implementation for other languages in Selenium.
My wife asked me for some rough LOC numbers on the WebKit project and I think I could share them with you here as well. They come from r221232. As I’ll take into account some generated code it is relevant to mention that I built WebKitGTK+ with the default CMake options.
First thing I did was running sloccount Source and got the following numbers:
Let’s have a look now at the LayoutTests (they test the functionality of WebCore + the platform). Tests are composed mainly by HTML files so if you run sloccount LayoutTests you get:
It’s quite interesting to see that sloccount does not consider HTML which is quite relevant when you’re testing a web engine so again, we have to count them manually (thanks to Carlos López who helped me to properly grep here as some binary lines were giving me a headache to get the numbers):
You can see 2205690 of “meaningful lines” that combine HTML + other languages that you can see above. I can’t substract here to just get the HTML lines because the number above take into account files with a different extension than HTML, though many of them do include other languages, specially JavaScript.
But the LayoutTests do not include only pure WebKit tests. There are some imported ones so it might be interesting to run the same procedure under LayoutTests/imported to see which ones are imported and not written directly into the WebKit project. I emphasize that because they can be written by WebKit developers in other repositories and actually I can present myself and Youenn Fablet as an example as we wrote tests some tests that were finally moved into the specification and included back later when imported. So again, sloccount LayoutTests/imported:
There are also some other tests that we can talk about, for example the JSTests. I’ll mention already the numbers summed up regarding languages and the manual HTML code (if you made it here, you know the drill already):
And this is all. Remember that these are just some rough statistics, not a “scientific” paper.
Update:
In her expert opinion, in the WebKit project we are devoting around 50% of the total LOC to testing, which makes it a software engineering “textbook” project regarding testing and I think we can be proud of it!
WebKitGTK+ has supported remote debugging for a long time. The current implementation uses WebSockets for the communication between the local browser (the debugger) and the remote browser (the debug target or debuggable). This implementation was very simple and, in theory, you could use any web browser as the debugger because all inspector code was served by the WebSockets. I said in theory because in the practice this was not always so easy, since the inspector code uses newer JavaScript features that are not implemented in other browsers yet. The other major issue of this approach was that the communication between debugger and target was not bi-directional, so the target browser couldn’t notify the debugger about changes (like a new tab open, navigation or that is going to be closed).
Apple abandoned the WebSockets approach a long time ago and implemented its own remote inspector, using XPC for the communication between debugger and target. They also moved the remote inspector handling to JavaScriptCore making it available to debug JavaScript applications without a WebView too. In addition, the remote inspector is also used by Apple to implement WebDriver. We think that this approach has a lot more advantages than disadvantages compared to the WebSockets solution, so we have been working on making it possible to use this new remote inspector in the GTK+ port too. After some refactorings to the code to separate the cross-platform implementation from the Apple one, we could add our implementation on top of that. This implementation is already available in WebKitGTK+ 2.17.1, the first unstable release of this cycle.
From the user point of view there aren’t many differences, with the WebSockets we launched the target browser this way:
$ WEBKIT_INSPECTOR_SERVER=127.0.0.1:1234 browser
This hasn’t changed with the new remote inspector. To start debugging we opened any browser and loaded
http://127.0.0.1:1234
With the new remote inspector we have to use any WebKitGTK+ based browser and load
inspector://127.0.0.1:1234
As you have already noticed, it’s no longer possible to use any web browser, you need to use a recent enough WebKitGTK+ based browser as the debugger. This is because of the way the new remote inspector works. It requires a frontend implementation that knows how to communicate with the targets. In the case of Apple that frontend implementation is Safari itself, which has a menu with the list of remote debuggable targets. In WebKitGTK+ we didn’t want to force using a particular web browser as debugger, so the frontend is implemented as a builtin custom protocol of WebKitGTK+. So, loading inspector:// URLs in any WebKitGTK+ WebView will show the remote inspector page with the list of debuggable targets.
It looks quite similar to what we had, just a list of debuggable targets, but there are a few differences:
A new debugger window is opened when inspector button is clicked instead of reusing the same web view. Clicking on inspect again just brings the window to the front.
The debugger window loads faster, because the inspector code is not served by HTTP, but locally loaded like the normal local inspector.
The target list page is updated automatically, without having to manually reload it when a target is added, removed or modified.
The debugger window is automatically closed when the target web view is closed or crashed.
How does the new remote inspector work?
The web browser checks the presence of WEBKIT_INSPECTOR_SERVER environment variable at start up, the same way it was done with the WebSockets. If present, the RemoteInspectorServer is started in the UI process running a DBus service listening in the IP and port provided. The environment variable is propagated to the child web processes, that create a RemoteInspector object and connect to the RemoteInspectorServer. There’s one RemoteInspector per web process, and one debuggable target per WebView. Every RemoteInspector maintains a list of debuggable targets that is sent to the RemoteInspector server when a new target is added, removed or modified, or when explicitly requested by the RemoteInspectorServer.
When the debugger browser loads an inspector:// URL, a RemoteInspectorClient is created. The RemoteInspectorClient connects to the RemoteInspectorServer using the IP and port of the inspector:// URL and asks for the list of targets that is used by the custom protocol handler to create the web page. The RemoteInspectorServer works as a router, forwarding messages between RemoteInspector and RemoteInspectorClient objects.
The Igalia WebKit team is happy to announce WebKitGTK+ 2.16. This new release drastically improves the memory consumption, adds new API as required by applications, includes new debugging tools, and of course fixes a lot of bugs.
Memory consumption
After WebKitGTK+ 2.14 was released, several Epiphany users started to complain about high memory usage of WebKitGTK+ when Epiphany had a lot of tabs open. As we already explained in a previous post, this was because of the switch to the threaded compositor, that made hardware acceleration always enabled. To fix this, we decided to make hardware acceleration optional again, enabled only when websites require it, but still using the threaded compositor. This is by far the major improvement in the memory consumption, but not the only one. Even when in accelerated compositing mode, we managed to reduce the memory required by GL contexts when using GLX, by using OpenGL version 3.2 (core profile) if available. In mesa based drivers that means that software rasterizer fallback is never required, so the context doesn’t need to create the software rasterization part. And finally, an important bug was fixed in the JavaScript garbage collector timers that prevented the garbage collection to happen in some cases.
CSS Grid Layout
Yes, the future here and now available by default in all WebKitGTK+ based browsers and web applications. This is the result of several years of great work by the Igalia web platform team in collaboration with bloomberg. If you are interested, you have all the details in Manuel’s blog.
New API
The WebKitGTK+ API is quite complete now, but there’s always new things required by our users.
Hardware acceleration policy
Hardware acceleration is now enabled on demand again, when a website requires to use accelerated compositing, the hardware acceleration is enabled automatically. WebKitGTK+ has environment variables to change this behavior, WEBKIT_DISABLE_COMPOSITING_MODE to never enable hardware acceleration and WEBKIT_FORCE_COMPOSITING_MODE to always enabled it. However, those variables were never meant to be used by applications, but only for developers to test the different code paths. The main problem of those variables is that they apply to all web views of the application. Not all of the WebKitGTK+ applications are web browsers, so it can happen that an application knows it will never need hardware acceleration for a particular web view, like for example the evolution composer, while other applications, especially in the embedded world, always want hardware acceleration enabled and don’t want to waste time and resources with the switch between modes. For those cases a new WebKitSetting hardware-acceleration-policy has been added. We encourage everybody to use this setting instead of the environment variables when upgrading to WebKitGTk+ 2.16.
Network proxy settings
Since the switch to WebKit2, where the SoupSession is no longer available from the API, it hasn’t been possible to change the network proxy settings from the API. WebKitGTK+ has always used the default proxy resolver when creating the soup context, and that just works for most of our users. But there are some corner cases in which applications that don’t run under a GNOME environment want to provide their own proxy settings instead of using the proxy environment variables. For those cases WebKitGTK+ 2.16 includes a new UI process API to configure all proxy settings available in GProxyResolver API.
Private browsing
WebKitGTK+ has always had a WebKitSetting to enable or disable the private browsing mode, but it has never worked really well. For that reason, applications like Epiphany has always implemented their own private browsing mode just by using a different profile directory in tmp to write all persistent data. This approach has several issues, for example if the UI process crashes, the profile directory is leaked in tmp with all the personal data there. WebKitGTK+ 2.16 adds a new API that allows to create ephemeral web views which never write any persistent data to disk. It’s possible to create ephemeral web views individually, or create ephemeral web contexts where all web views associated to it will be ephemeral automatically.
Website data
WebKitWebsiteDataManager was added in 2.10 to configure the default paths on which website data should be stored for a web context. In WebKitGTK+ 2.16 the API has been expanded to include methods to retrieve and remove the website data stored on the client side. Not only persistent data like HTTP disk cache, cookies or databases, but also non-persistent data like the memory cache and session cookies. This API is already used by Epiphany to implement the new personal data dialog.
Dynamically added forms
Web browsers normally implement the remember passwords functionality by searching in the DOM tree for authentication form fields when the document loaded signal is emitted. However, some websites add the authentication form fields dynamically after the document has been loaded. In those cases web browsers couldn’t find any form fields to autocomplete. In WebKitGTk+ 2.16 the web extensions API includes a new signal to notify when new forms are added to the DOM. Applications can connect to it, instead of document-loaded to start searching for authentication form fields.
Custom print settings
The GTK+ print dialog allows the user to add a new tab embedding a custom widget, so that applications can include their own print settings UI. Evolution used to do this, but the functionality was lost with the switch to WebKit2. In WebKitGTK+ 2.16 a similar API to the GTK+ one has been added to recover that functionality in evolution.
Two new debugged tools are now available in WebKitGTk+ 2.16. The memory sampler and the resource usage overlay.
Memory sampler
This tool allows to monitor the memory consumption of the WebKit processes. It can be enabled by defining the environment variable WEBKIT_SMAPLE_MEMORY. When enabled, the UI process and all web process will automatically take samples of memory usage every second. For every sample a detailed report of the memory used by the process is generated and written to a file in the temp directory.
$ WEBKIT_SAMPLE_MEMORY=1 MiniBrowser
Started memory sampler for process MiniBrowser 32499; Sampler log file stored at: /tmp/MiniBrowser7ff2246e-406e-4798-bc83-6e525987aace
Started memory sampler for process WebKitWebProces 32512; Sampler log file stored at: /tmp/WebKitWebProces93a10a0f-84bb-4e3c-b257-44528eb8f036
The files contain a list of sample reports like this one:
Timestamp 1490004807
Total Program Bytes 1960214528
Resident Set Bytes 84127744
Resident Shared Bytes 68661248
Text Bytes 4096
Library Bytes 0
Data + Stack Bytes 87068672
Dirty Bytes 0
Fast Malloc In Use 86466560
Fast Malloc Committed Memory 86466560
JavaScript Heap In Use 0
JavaScript Heap Committed Memory 49152
JavaScript Stack Bytes 2472
JavaScript JIT Bytes 8192
Total Memory In Use 86477224
Total Committed Memory 86526376
System Total Bytes 16729788416
Available Bytes 5788946432
Shared Bytes 1037447168
Buffer Bytes 844214272
Total Swap Bytes 1996484608
Available Swap Bytes 1991532544
Resource usage overlay
The resource usage overlay is only available in Linux systems when WebKitGTK+ is built with ENABLE_DEVELOPER_MODE. It allows to show an overlay with information about resources currently in use by the web process like CPU usage, total memory consumption, JavaScript memory and JavaScript garbage collector timers information. The overlay can be shown/hidden by pressing CTRL+Shit+G.
We plan to add more information to the overlay in the future like memory cache status.
WebKitGTK+ 2.14 release was very exciting for us, it finally introduced the threaded compositor to drastically improve the accelerated compositing performance. However, the threaded compositor imposed the accelerated compositing to be always enabled, even for non-accelerated contents. Unfortunately, this caused different kind of problems to several people, and proved that we are not ready to render everything with OpenGL yet. The most relevant problems reported were:
Memory usage increase: OpenGL contexts use a lot of memory, and we have the compositor in the web process, so we have at least one OpenGL context in every web process. The threaded compositor uses the coordinated graphics model, that also requires more memory than the simple mode we previously use. People who use a lot of tabs in epiphany quickly noticed that the amount of memory required was a lot more.
Startup and resize slowness: The threaded compositor makes everything smooth and performs quite well, except at startup or when the view is resized. At startup we need to create the OpenGL context, which is also quite slow by itself, but also need to create the compositing thread, so things are expected to be slower. Resizing the viewport is the only threaded compositor task that needs to be done synchronously, to ensure that everything is in sync, the web view in the UI process, the OpenGL viewport and the backing store surface. This means we need to wait until the threaded compositor has updated to the new size.
Rendering issues: some people reported rendering artifacts or even nothing rendered at all. In most of the cases they were not issues in WebKit itself, but in the graphic driver or library. It’s quite diffilcult for a general purpose web engine to support and deal with all possible GPUs, drivers and libraries. Chromium has a huge list of hardware exceptions to disable some OpenGL extensions or even hardware acceleration entirely.
Because of these issues people started to use different workarounds. Some people, and even applications like evolution, started to use WEBKIT_DISABLE_COMPOSITING_MODE environment variable, that was never meant for users, but for developers. Other people just started to build their own WebKitGTK+ with the threaded compositor disabled. We didn’t remove the build option because we anticipated some people using old hardware might have problems. However, it’s a code path that is not tested at all and will be removed for sure for 2.18.
All these issues are not really specific to the threaded compositor, but to the fact that it forced the accelerated compositing mode to be always enabled, using OpenGL unconditionally. It looked like a good idea, entering/leaving accelerated compositing mode was a source of bugs in the past, and all other WebKit ports have accelerated compositing mode forced too. Other ports use UI side compositing though, or target a very specific hardware, so the memory problems and the driver issues are not a problem for them. The imposition to force the accelerated compositing mode came from the switch to using coordinated graphics, because as I said other ports using coordinated graphics have accelerated compositing mode always enabled, so they didn’t care about the case of it being disabled.
There are a lot of long-term things we can to to improve all the issues, like moving the compositor to the UI (or a dedicated GPU) process to have a single GL context, implement tab suspension, etc. but we really wanted to fix or at least improve the situation for 2.14 users. Switching back to use accelerated compositing mode on demand is something that we could do in the stable branch and it would improve the things, at least comparable to what we had before 2.14, but with the threaded compositor. Making it happen was a matter of fixing a lot bugs, and the result is this 2.14.4 release. Of course, this will be the default in 2.16 too, where we have also added API to set a hardware acceleration policy.
We recommend all 2.14 users to upgrade to 2.14.4 and stop using the WEBKIT_DISABLE_COMPOSITING_MODE environment variable or building with the threaded compositor disabled. The new API in 2.16 will allow to set a policy for every web view, so if you still need to disable or force hardware acceleration, please use the API instead of WEBKIT_DISABLE_COMPOSITING_MODE and WEBKIT_FORCE_COMPOSITING_MODE.
We really hope this new release and the upcoming 2.16 will work much better for everybody.
Igalia is
hiring. We're currently interested in Multimedia
and Chromium
developers. Check the announcements for details on the
positions and our company.
From September 26th to 28th we celebrated at the Igalia HQ the 2016 edition of the Web Engines Hackfest. This year we broke all records and got participants from the three main companies behind the three biggest open source web engines, say Mozilla, Google and Apple. Or course, it was not only them, we had some other companies and ourselves. I was active part of the organization and I think we not only did not get any complain but people were comfortable and happy around.
We had several talks (I included the slides and YouTube links):
We had lots and lots of interesting hacking and we also had several breakout sessions:
WebKitGTK+ / Epiphany
Servo
WPE / WebKit for Wayland
Layout Models (Grid, Flexbox)
WebRTC
JavaScript Engines
MathML
Graphics in WebKit
What I did during the hackfest was working with Enrique and Žan to advance on reviewing our downstream implementation of our GStreamer based of Media Source Extensions (MSE) in order to land it as soon as possible and I can proudly say that we did already (we didn’t finish at the hackfest but managed to do it after it). We broke the bots and pissed off Michael and Carlos but we managed to deactivate it by default and continue working on it upstream.
So summing up, from my point of view and it is not only because I was part of the organization at Igalia, based also in other people’s opinions, I think the hackfest was a success and I think we will continue as we were or maybe growing a bit (no spoilers!).
Finally I would like to thank our gold sponsors Collabora and Igalia and our silver sponsor Mozilla.
The goal was to have the WebKit WebRTC tests working for a demo. My fellow Igalian Alex was working on the platform itself in WebKit and assisting with some tuning for the Pi on WebKit but the main work needed to be done in OpenWebRTC.
My other fellow Igalian Phil had begun a branch to work on this that was half way with some workarounds. My first task was getting into combat/workaround mode and make OpenWebRTC work with compressed streams from gst-rpicamsrc. OpenWebRTC supported only raw video streams and that Raspberry Pi Cam module GStreamer element provides only H264 encoded ones. I moved some encoders and parsers around, some caps modifications, removed some elements that didn’t work on the Pi and made it work eventually. You can see the result at:
<
video style=”max-width: 100%;” src=”/xrcalvar/files/2016/10/201607-webrtc.mp4″ controls poster=”/xrcalvar/files/2016/10/webrtc-poster.png”>
To make this work by yourselves you needed a custom branch of Buildroot where you could build with the proper plugins enabled also selected the appropriate branches in WPE and OpenWebRTC.
Unfortunately the work was far from being finished so I continued the effort to try to make the arch changes in OpenWebRTC have production quality and that meant do some tasks step by step:
Rework the video orientation code: The workaround deactivated it as so far it was being done in GStreamer. In the case of rpicamsrc that can be done by the hardware itself so I cooked a GStreamer interface to enable rotation the same way it was done for the [gl]videoflip elements. The idea would be deprecate the original ones and use the new interface. These landed both in videoflip and glvideoflip. Of course I also implemented it on gst-rpicamsrc here and here and eventually in OpenWebRTC sources.
Rework video flip: Once OpenWebRTC sources got orientation support, I could rework the flip both for local and remote feeds.
Add gl{down|up}load elements back: There were some issues with the gl elements to upload and download textures, which we had removed. I readded them again.
Reworked bins linking: In OpenWebRTC there are some bins that are created to perform some tasks and depending on different circumstances you add or not some elements. I reworked the way those elements are linked so that we don’t have to take into account all the use cases to link them. Now this is easier as the elements are linked as they are the added to the bin.
Reworked the renderer_disabled: As in the case for orientation, some elements such as gst-rpicamsrc are able to change color and balance so I added support for that to avoid having that done by GStreamer elements if not necessary. In this case the proper interfaces were already there in GStreamer.
Moved the decoding/parsing from the source to the renderer: Before our changes the source was parsing/decoding the remote feeds, local sources were not decoded, just raw was supported. Our workarounds made the local sources decode too but this was not working for all cases. So why decoding at the sources when GStreamer has caps and you can just chain all that to the renderers? So I did eventually, I moved the parsing/decoding to the renderers. This took fixing all the caps negotiation from sources to renderers. Unfortunatelly I think I broke audio on the way, but surely nothing difficult to fix.
This is still a work in progress and now I am changing tasks and handing this over back to my fellow Igalians Phil, who I am sure will do an awesome job together with Alex.
And again, thanks to Igalia for letting me work on this and to Metrological that is sponsoring this work.
I haven’t blogged in a while -mostly due to lack of time, as usual- but I thought I’d write something today to let the world know about one of the things I’ve worked on a bit during this week, while remotely attending the Web Engines Hackfest from home:
Setting up an environment for cross-compiling WebKit2GTK+ for ARM
I know this is not new, nor ground-breaking news, but the truth is that I could not find any up-to-date documentation on the topic in a any public forum (the only one I found was this pretty old post from the time WebKitGTK+ used autotools), so I thought I would devote some time to it now, so that I could save more in the future.
Of course, I know for a fact that many people use local recipes to cross-compile WebKit2GTK+ for ARM (or simply build in the target machine, which usually takes a looong time), but those are usually ad-hoc things and hard to reproduce environments locally (or at least hard for me) and, even worse, often bound to downstream projects, so I thought it would be nice to try to have something tested with upstream WebKit2GTK+ and publish it on trac.webkit.org,
So I spent some time working on this with the idea of producing some step-by-step instructions including how to create a reproducible environment from scratch and, after some inefficient flirting with a VM-based approach (which turned out to be insanely slow), I finally settled on creating a chroot + provisioning it with a simple bootstrap script + using a simple CMake Toolchain file, and that worked quite well for me.
In my fast desktop machine I can now get a full build of WebKit2GTK+ 2.14 (or trunk) in less than 1 hour, which is pretty much a productivity bump if you compare it to the approximately 18h that takes if I build it natively in the target ARM device I have :-)
Note that I’m not a CMake expert (nor even close) so the toolchain file is far from perfect, but it definitely does the job with both the 2.12.x and 2.14.x releases as well as with the trunk, so hopefully it will be useful as well for someone else out there.
Last, I want to thanks the organizers of this event for making it possible once again (and congrats to Igalia, which just turned 15 years old!) as well as to my employer for supporting me attending the hackfest, even if I could not make it in person this time.
We did it again, the IgaliaWebKit team is pleased to announce a new stable release of WebKitGTK+, with a bunch of bugs fixed, some new API bits and many other improvements. I’m going to talk here about some of the most important changes, but as usual you have more information in the NEWS file.
FTL
FTL JIT is a JavaScriptCore optimizing compiler that was developed using LLVM to do low-level optimizations. It’s been used by the Mac port since 2014 but we hadn’t been able to use it because it required some patches for LLVM to work on x86-64 that were not included in any official LLVM release, and there were also some crashes that only happened in Linux. At the beginning of this release cycle we already had LLVM 3.7 with all the required patches and the crashes had been fixed as well, so we finally enabled FTL for the GTK+ port. But in the middle of the release cycle Apple surprised us announcing that they had the new FTL B3 backend ready. B3 replaces LLVM and it’s entirely developed inside WebKit, so it doesn’t require any external dependency. JavaScriptCore developers quickly managed to make B3 work on Linux based ports and we decided to switch to B3 as soon as possible to avoid making a new release with LLVM to remove it in the next one. I’m not going to enter into the technical details of FTL and B3, because they are very well documented and it’s probably too boring for most of the people, the key point is that it improves the overall JavaScript performance in terms of speed.
Persistent GLib main loop sources
Another performance improvement introduced in WebKitGTK+ 2.12 has to do with main loop sources. WebKitGTK+ makes an extensive use the GLib main loop, it has its own RunLoop abstraction on top of GLib main loop that is used by all secondary processes and most of the secondary threads as well, scheduling main loop sources to send tasks between threads. JavaScript timers, animations, multimedia, the garbage collector, and many other features are based on scheduling main loop sources. In most of the cases we are actually scheduling the same callback all the time, but creating and destroying the GSource each time. We realized that creating and destroying main loop sources caused an overhead with an important impact in the performance. In WebKitGTK+ 2.12 all main loop sources were replaced by persistent sources, which are normal GSources that are never destroyed (unless they are not going to be scheduled anymore). We simply use the GSource ready time to make them active/inactive when we want to schedule/stop them.
Overlay scrollbars
GNOME designers have requested us to implement overlay scrollbars since they were introduced in GTK+, because WebKitGTK+ based applications didn’t look consistent with all other GTK+ applications. Since WebKit2, the web view is no longer a GtkScrollable, but it’s scrollable by itself using native scrollbars appearance or the one defined in the CSS. This means we have our own scrollbars implementation that we try to render as close as possible to the native ones, and that’s why it took us so long to find the time to implement overlay scrollbars. But WebKitGTK+ 2.12 finally implements them and are, of course, enabled by default. There’s no API to disable them, but we honor the GTK_OVERLAY_SCROLLING environment variable, so they can be disabled at runtime.
But the appearance was not the only thing that made our scrollbars inconsistent with the rest of the GTK+ applications, we also had a different behavior regarding the actions performed for mouse buttons, andsomeotherbugs that are all fixed in 2.12.
The NetworkProcess is now mandatory
The network process was introduced in WebKitGTK+ since version 2.4 to be able to use multiple web processes. We had two different paths for loading resources depending on the process model being used. When using the shared secondary process model, resources were loaded by the web process directly, while when using the multiple web process model, the web processes sent the requests to the network process for being loaded. The maintenance of this two different paths was not easy, with some bugs happening only when using one model or the other, and also the network process gained features like the disk cache that were not available in the web process. In WebKitGTK+ 2.12 the non network process path has been removed, and the shared single process model has become the multiple web process model with a limit of 1. In practice it means that a single web process is still used, but the network happens in the network process.
NPAPI plugins in Wayland
I read it in many bug reports and mailing lists that NPAPI plugins will not be supported in wayland, so things like http://extensions.gnome.org will not work. That’s not entirely true. NPAPI plugins can be windowed or windowless. Windowed plugins are those that use their own native window for rendering and handling events, implemented in X11 based systems using XEmbed protocol. Since Wayland doesn’t support XEmbed and doesn’t provide an alternative either, it’s true that windowed plugins will not be supported in Wayland. Windowless plugins don’t require any native window, they use the browser window for rendering and events are handled by the browser as well, using X11 drawable and X events in X11 based systems. So, it’s also true that windowless plugins having a UI will not be supported by Wayland either. However, not all windowless plugins have a UI, and there’s nothing X11 specific in the rest of the NPAPI plugins API, so there’s no reason why those can’t work in Wayland. And that’s exactly the case of http://extensions.gnome.org, for example. In WebKitGTK+ 2.12 the X11 implementation of NPAPI plugins has been factored out, leaving the rest of the API implementation common and available to any window system used. That made it possible to support windowless NPAPI plugins with no UI in Wayland, and any other non X11 system, of course.
WebView session save/restore: It’s now possible to serialize and deserialize a WebKitWebView session which includes information like the history navigation, the scroll position or HTML POST data.
Notifications click action: It allows to notify WebKit that a web notification has been clicked by the user to perform the appropriate action.
Console messages API: This is one of APIs we never ported to WebKit2. It’s now available as part of the Web Extensions API, WebKitWebPage emits a signal every time a message is sent to the JavaScriptConsole with a WebKitConsoleMessage parameter containing all the message information.
In this post I am going to talk about the implementation of the Media Source Extensions (known as MSE) in the WebKit ports that use GStreamer. These ports are WebKitGTK+, WebKitEFL and WebKitForWayland, though only the latter has the latest work-in-progress implementation. Of course we hope to upstream WebKitForWayland soon and with it, this backend for MSE and the one for EME.
My colleague Enrique at Igalia wrote a post about this about a week ago. I recommend you read it before continuing with mine to understand the general picture and the some of the issues that I managed to fix on that implementation. Come on, go and read it, I’ll wait.
One of the challenges here is something a bit unnatural in the GStreamer world. We have to process the stream information and then make some metadata available to the JavaScript app before playing instead of just pushing everything to a playing pipeline and being happy. For this we created the AppendPipeline, which processes the data and extracts that information and keeps it under control for the playback later.
The idea of the our AppendPipeline is to put a data stream into it and get it processed at the other side. It has an appsrc, a demuxer (qtdemux currently
) and an appsink to pick up the processed data. Something tricky of the spec is that when you append data into the SourceBuffer, that operation has to block it and prevent with errors any other append operation while the current is ongoing, and when it finishes, signal it. Our main issue with this is that the the appends can contain any amount of data from headers and buffers to only headers or just partial headers. Basically, the information can be partial.
First I’ll present again Enrique’s AppendPipeline internal state diagram:
First let me explain the easiest case, which is headers and buffers being appended. As soon as the process is triggered, we move from Not started to Ongoing, then as the headers are processed we get the pads at the demuxer and begin to receive buffers, which makes us move to Sampling. Then we have to detect that the operation has ended and move to Last sample and then again to Not started. If we have received only headers we will not move to Sampling cause we will not receive any buffers but we still have to detect this situation and be able to move to Data starve and then again to Not started.
Our first approach was using two different timeouts, one to detect that we should move from Ongoing to Data starve if we did not receive any buffer and another to move from Sampling to Last sample if we stopped receiving buffers. This solution worked but it was a bit racy and we tried to find a less error prone solution.
We tried then to use custom downstream events injected from the source and at the moment they were received at the sink we could move from Sampling to Last sample or if only headers were injected, the pads were created and we could move from Ongoing to Data starve. It took some time and several iterations to fine tune this but we managed to solve almost all cases but one, which was receiving only partial headers and no buffers.
If the demuxer received partial headers and no buffers it stalled and we were not receiving any pads or any event at the output so we could not tell when the append operation had ended. Tim-Philipp gave me the idea of using the need-data signal on the source that would be fired when the demuxer ran out of useful data. I realized then that the events were not needed anymore and that we could handle all with that signal.
The need-signal is fired sometimes when the pipeline is linked and also when the the demuxer finishes processing data, regardless the stream contains partial headers, complete headers or headers and buffers. It works perfectly once we are able to disregard that first signal we receive sometimes. To solve that we just ensure that at least one buffer left the appsrc with a pad probe so if we receive the signal before any buffer was detected at the probe, it shall be disregarded to consider that the append has finished. Otherwise, if we have seen already a buffer at the probe we can consider already than any need-data signal means that the processing has ended and we can tell the JavaScript app that the append process has ended.
Both need-data signal and probe information come in GStreamer internal threads so we could use mutexes to overcome any race conditions. We thought though that deferring the operations to the main thread through the pipeline bus was a better idea that would create less issues with race conditions or deadlocks.
To finish I prefer to give some good news about performance. We use mainly the YouTube conformance tests to ensure our implementation works and I can proudly say that these changes reduced the time of execution in half!
About a year ago, Igalia was approached by the people
working on printing-related technologies in HP to see
whether we could give them a hand in their ongoing effort
to improve the printing experience in the web. They had
been working for a while in extensions for popular web
browsers that would allow users, for example, to distill a
web page from cruft and ads and format its relevant
contents in a way that would be pleasant to read in
print. While these extensions were working fine, they were
interested in exploring the possibility of adding this
feature to popular browsers, so that users wouldn't need
to be bothered with installing extensions to have an
improved printing experience.
That's how Alex, Martin, and me spent a few months
exploring the Chromium project and its printing
architecture. Soon enough we found out that the Chromium
developers had been working already on a feature that
would allow pages to be removed from cruft and presented
in a sort of reader mode, at least in mobile
versions of the browser. This is achieved through a
module called dom
distiller, which basically has the ability to traverse
the DOM tree of a web page and return a clean DOM tree
with only the important contents of the page. This module
is based on the algorithms and heuristics in a project
called boilerpipe with some of it also coming from the now
popular Readability. Our goal, then, was to integrate the
DOM distiller with the modules in Chromium that take care
of generating the document that is then sent to both the
print preview and the printing service, as well as making
this feature available in the printing UI.
After a couple of months of work and thanks to the kind
code reviews of the folks at Google, we got the feature
landed in Chromium's repository. For a while, though, it
remained hidden behind a runtime flag, as the Chromium
team needed to make sure that things would work well
enough in all fronts before making it available to all
users. Fast-forward to last week, when I found out by
chance that the runtime flag has been flipped and the
Simplify page printing option has been available
in Chromium and Chrome for a while now, and it has even
reached the stable releases. The reader mode
feature in Chromium seems to remain hidden behind a
runtime flag, I think, which is interesting considering
that this was the original motivation behind the dom
distiller.
As a side note, it is worth mentioning that the
collaboration with HP was pretty neat and it's a good
example of the ways in which Igalia can help organizations
to improve the web experience of users. From the standards
that define the web to the browsers that people use in
their everyday life, there are plenty of areas in which
work needs to be done to make the web a more pleasant
place, for web developers and users alike. If your
organization relies on the web to reach its users, or to
enable them to make use of your technologies, chances are
that there are areas in which their experience can be
improved and that's one of the things we love doing.
And once again, in December we celebrated the hackfest. This year happened between Dec 7-9 at the Igalia premises and the scope was much broader than WebKitGTK+, that’s why it was renamed as Web Engines Hackfest. We wanted to gather people working on all open source web engines and we succeeded as we had people working on WebKit, Chromium/Blink and Servo.
The edition before this I was working with Youenn Fablet (from Canon) on the Streams API implementation in WebKit and we spent our time on the same thing again. We have to say that things are much more mature now. During the hackfest we spent our time in fixing the JavaScriptCore built-ins inside WebCore and we advanced on the automatic importation of the specification web platform tests, which are based on our prior test implementation. Since now they are managed there, it does not make sense to maintain them inside WebKit too, we just import them. I must say that our implementation is fairly complete since we support the current version of the spec and have almost all tests passing, including ReadableStream, WritableStream and the built-in strategy classes. What is missing now is making Streams work together with other APIs, such as Media Source Extensions, Fetch or XMLHttpRequest.
There were some talks during the hackfest and we did not want to be less, so we had our own about Streams. You can enjoy it here:
You can see all hackfest talks in this YouTube playlist. The ones I liked most were the ones by Michael Catanzaro about HTTP security, which is always interesting given the current clumsy political movements against cryptography and the one by Dominik Röttsches about font rendering. It is really amazing what a browser has to do just to get some letters painted on the screen (and look good).
As usual, the environment was amazing and we had a great time, including the traditional Street Fighter‘s match, where Gustavo found a worthy challenger in Changseok
Of course, I would like to thank Collabora and Igalia for sponsoring the event!
And by the way, quite shortly after that, I became a WebKit reviewer!