Accurately Predicting Ad Blocker Savings

Ad Blocker Savings

This research was conducted by Dr. Andrius Aucinas, performance researcher at Brave, Moritz Haller, data scientist at Brave, and Dr. Ben Livshits, Brave’s Chief Scientist.

We have written before on Brave’s performance, energy and bandwidth benefits for the user. Brave Shields is our primary mechanism for protecting user privacy, but many users know by now that ad and tracker blocking (or just ad blocking for short) makes the web faster and generally better for them. So far Brave’s estimates of the users’ time saved have been very conservative and somewhat naive: we take the total number of ads and trackers blocked, and multiply that by 50 milliseconds. Why this specific number? It is at the low end of what others have estimated to be third-party JavaScript execution overheads, but in fact both in the third-party impact study and in our measurements in this study, the average and median impact of an ad or tracker is more than 10 times higher. Clearly, it is time for an update.

HTTPS Upgrades

However, we can’t typically know without loading the full, bloated version of a page, just how much of the user’s resources are saved. The way current ad networks work makes the problem even more challenging: typically a script included directly by a publisher is only the tip of the iceberg that sets off a chain of requests to many third parties. Let’s take The Verge as a concrete example: the two graphs illustrate loading the main page with ad blocking (top) and without (bottom), where the little colored lines in the illustration show when each network request happened and how long it took, and the summary bar at the bottom of each graph provides top level statistics: how many requests were made, bandwidth used, and a few load time metrics.

What is evident from this illustration is that the savings are substantial: multiple megabytes and seconds of loading time for a single page. Brave prevents most of the requests from even being made: out of the 48 requests, 12 are blocked, implying neither network traffic nor processing load, while the other 364 (!) requests made over the course of a minute while on the page are never even seen. Not shown in the developer tools: the blocked resources that would have contributed 2.8 seconds of CPU time for just executing JavaScript, on a single page.

So we set out to explore how we can accurately predict how much exactly we save from a clean page - one that has had all ads and trackers removed. It turns out we can accurately estimate bandwidth and compute time savings based on page details like the few blocked resources we observe, as well as the clean page’s performance metrics, number of the different types of resources loaded and the overall page DOM size.

Important Performance Metrics

Another detail that the earlier figure helps illustrate is the difficulty of choosing the “right” performance metrics. The ones in the figure, DOMContentLoaded and Load (Page Load Time) are some of the most generic metrics that are widely used. However, the web performance community has largely rejected them by now and it isn’t hard to see why: a lot of network events happen well after the page is “loaded”, especially with no ad blocking where the last network events recorded still happen after a full minute from starting the page load, while the page was “loaded” after 1.4 seconds! At the same time, there are various arguments that they don’t correctly reflect the actual user experience and alternatives such as First Meaningful Paint (the time when the important visible parts of the document have been painted and the user can start consuming the content) have been proposed, but they are hard to generalize across a wide set of different pages.

There are, however, two metrics that are universal across websites and reflect the amount of resources consumed by a page:

  • Total bandwidth used, including all resources loaded by the page, synchronous, asynchronous, content, ads, etc.
  • Total JavaScript CPU time used, measuring the cumulative time taken by scripts running on the main thread. Since JavaScript is essentially single-threaded (except if Web Workers are used), this accounts for all script execution time.

These metrics do not try to answer how fast a website feels for the user, but they reflect other performance aspects, like how responsive the device is during browsing, how much energy it consumes, how much of the user metered data plan is used. Remember that not everyone is browsing the web with the top-of-the-line device on a 5G network with an “unlimited” data plan.

Among the many benefits of ad and tracker blocking, it improves both of these metrics. Let’s define the bandwidth savings and CPU time savings as the difference between bandwidth and CPU time used with and without ad blocking. Our goal is then to model how much ad blocking saves by only observing the page when loaded with ad blocking on.

Test Setup

We start by collecting data across a relatively small but diverse set of websites. We selected 100 sites and 10 pages on each site:

30 popular publishers and ecommerce sites (including BBC, CNN, The Washington Post, Amazon UK, Ebay UK)
40 random sites among Alexa Top 400 sites in the UK, skipping duplicates from the first set as well as adult content pages. This included web sites of ISPs, train ticketing, ecommerce, news outlets, etc.
30 random sites among Alexa top 400 - 10000 sites in the UK, making sure that the pages are actually working. This further widened our page diversity to include pages in other languages, e.g. Russian, Dutch and Norwegian, although most of the pages were still in English.
For each site, 10 random pages linked from the main page were selected to further diversify the pages in our dataset. The complete lists of sites and pages are included in the GitHub repository along with the code we used for data collection.

0 Коментари