June 2016 update at end of article (TL;DR — blocking levels have gotten a little higher, likely due to Samsung allowing blocking)
There’s now a significant number of users that use ad-blockers, estimated at about 15% in the US (according to PageFair), and that number grew quickly in 2015. Extensions like AdBlock Plus, NoScript, and Ghostery are incredibly popular (#1, #4, and #6 amongst the most popular Firefox add-ons), and for good reasons. The ad-blocking “war” has been a very hot topic of conversion, so I wanted to take a look at some of the collateral damage in that war — blocked analytics trackers.
Ad Blockers That Block Analytics
Since ad blockers are focused on blocking ads (shocking, I know…) we might not think about their effects on our analytics trackers. But did you know that most of the popular blockers can block analytics trackers as well? Some even do it by default. There are a lot of different ad-blockers out there on a lot of different platforms, but a quick run through the most popular per platform shows a huge number of users that could be blocking our analytics trackers. I’m focusing on Google Analytics here, but really this is applicable for any third-party tracker. Here’s a list of some of the top blockers (and there are a lot more out there!):
|App / Extension||Platform||User Base (Dec 2015)||Blocks GA by default?|
|AdBlock Plus||Cross-platform||21M users (Firefox)|
300M total downloads
|No, but easily added|
(one click post-install)
|Adblock||primarily Chrome||40M+ (Chrome)||No, but easily added.|
|uBlock Origin||Cross-platform||630k users (Firefox)|
2.5M users (Chrome)
|Ghostery||Cross-platform||2.3M users (Chrome)|
1.5M users (Firefox)
|No, but easily added.|
|Purify||iOS||unknown, top 10 at launch*||Yes|
|Adblock Browser||Android||1-5M installs||Yes|
|Google Analytics Opt-Out||Chrome||720k users||Yes|
* Post-iOS9 launch blockers Peace, Purify & Crystal were #1, #3 & #6 respectively in the App Store.
(Browser add-on active users are based on users that are actively check for updates, seemingly regardless of add-on status. This means users that have the add-on disabled would still be counted. However most users do not disable, Firefox reports 95% of uBlock Origin users and 96% of AdBlock Plus users did have their add-ons set to enabled according to their add-on status information page).
Who even needs a plugin? As of November Firefox includes tracking blocking protection in the browser itself, on by default and blocking GA in private browsing windows.
That’s a significant potential for gaps in our measurements, but how many users exactly?? The biggest user base is the AdBlock Plus users, but it is hard to know how many of those users choose to block analytics trackers. When you install AdBlock Plus you get a post-install screen that gives you a choice to turn on tracking / analytics blocking:
We know that most users stick with the defaults (which does not block GA), but considering the size of the install base that is probably a number big enough to be concerned about. If you are a user that is motivated / annoyed enough to install an ad blocker you are already in the group much less likely to stick with defaults.
Measuring the amount of users that run ad-blockers in general can be a little tricky, but the primary method is pretty well established at this point. You create content that looks like an ad on your site and then see if is being blocked by looking at what is ultimately loaded by the browser. Daniel Carlbom has an easy-to-implement ad-blocker user counting method on his blog using Google Tag Manger, or PageFair also offers a free service that will track your ad-block percentage with a sensor you place on your site.
Except what happens if your analytics service is also being blocked?! Both GA and PageFair are in the Ghostery and AdBlock Plus list of analytics trackers to be blocked. This means these methods are measuring users that block ads but do allow analytics trackers, so your actual ad-block percentages are even higher than these methods show.
So what do we do? How do we measure things when its our measurement software itself that’s being blocked! Should we all just go back to server log file analyzers!?
After breathing into a bag for a while, I decided to try a small experiment to see if I could get at least a general idea of how many people are blocking Google Analytics.
An Experiment to Count the Blockers
My approach was to run GA in parallel, once as normal in the browser, and then once using the GA measurement protocol on the server-side. So every request would result in firing two different events; one client-side through analytics.js, and one server-side via the measurement protocol API:
This has the effect of double-counting everything, an event call from the front-end along with an event call from the back-end with all the relevant front-end info passed to it.
I just needed a way to line up the two event hits, so I’d know it was the same end-user making both requests. Enter browser fingerprinting. I expect many of you are familiar with browser fingerprinting; it is a way to assign a (relatively) unique id to each user’s browser based on the configuration and settings of that browser (first brought into wide discussion from the EFF’s Panopticlick project). I used fingerprint2.js to create the fingerprints, which uses 24 different sources of settings to assign a unique id per browser. It is certainly possible to end up with the same fingerprint for two different end-users, but it’s rare and in our case is not likely to influence our overall numbers since it would simply take one data point away.
(I could have achieved a similar effect with a random id that I created and then dropped as a cookie, but I chose to use a fingerprint because using cookies to track users that run blockers might have lead to a small number of users that aggressively clean their cookies getting counted a bunch of times. It also leads to less duplication in the case of bots that might purge their cookies frequently.)
The end result is lots and lots of duplicate events:
What you’ll (hopefully) notice is that every event comes in a pair in the list above aside from one. So what happened there? This happened:
Meaning something happened to block the analytics.js code from reporting to GA, which is exactly what we want to measure! In the example above I know for sure what happened because that one was me testing — using Ghostery to block GA.
So we can just count up all the cases where we got a call from the server-side, but not the client-side, right?
Well… almost. There are a couple of special cases that we need to look at first.
2. Callback failures.
Ok, I admit it, that chart above is not entirely correct. There’s still another way we can get only the server-side measurement call:
If the AJAX call is made but the browser disappears before the callback happens, then we can also miss out on the client-side call. Or if the browser just doesn’t execute the callback code itself we will get mismatch.
Because the fingerprint generation is relatively slow (about 500ms for me the first time it runs) and the AJAX call doesn’t happen until that succeeds and is relatively fast (~200ms depending on latency), the browser would need to disappear at just the right time (in between when the AJAX call was made and when it returns) to miss the callback. This could happen if you hit the back button at just the right time. To verify that this wasn’t a major issue I ran a test looking at sessions with > 1 event. This means the user hit more than one page, which solves the back button issue, though potentially not other technical callback issues. The result was nearly the same rate of blocking users, just a much smaller sample size, which convinced me to ignore this possible effect.
Ok, finally time to count up the results. There’s no way inside the regular GA reporting system itself to do that kind of comparison on individual sessions, but it’s easy to do in R, so let’s make a venn diagram!
The red is only server-side measured, blue is only client-side, and purple is measured by both. The blue (client-side only) sliver is so small (.34%) it’s hard to see, but that’s what we should expect. It should be exceedingly rare in our scenario that the client-side fired ok when the server didn’t. This might happen if there was a PHP error on the server-side or the measurement protocol hit was never registered for whatever reason. At .34% I’m not going to worry too much about that. The server-side only slice is very big though, over 8%!
That a pretty high number, definitely higher than I was expecting. Let’s do some sanity checks. Based on what we know about ad-blocking popularity and how the tools work I’d expect the following (for the record, I came up with this list before processing the results):
- Higher percentage of blocking in the EU vs. US.
- Higher percentage of blocking on desktop vs. mobile.
- Higher on iOS vs Android.
- More Firefox, less IE.
So a confirmation on all our assumptions! Could it really be so high, 8.7% missed?
First off, the site I used for testing is one that is solidly in the demographic for running blockers: young (50% 18-34), male (68%), and internet savvy (the #1 page on the site is internet meme-related). These demographics have potentially very high levels of blocking. A survey from Moz + Fractl found 63% of respondents in that 18-34 demographic said they used ad blockers, though most actively measured (rather than surveyed) numbers have been much lower. If for example the true ad blocker percentage on our test site was 25% it doesn’t seem unreasonable to me that 1/3 of those people were also blocking analytics trackers.
Second, there is still the issue of bots & callback failures. A certain amount of callback failures may happen, though I’m unable to estimate how many exactly. In my browser testing the callback method was 100% reliable, but in the wild it is likely to be imperfect.
Overall this is a somewhat limited experiment on a single niche site (about 2,400 users measured over a month), but it was enough to convince me that there could be quite a few users disappearing from our third party analytics. For increased accuracy I would like to run the experiment again on a site with more traffic, a more average demographic, and more comprehensive bot detection.
Most of us analysts rely on these numbers so much that it’s difficult to consider the ways in which they are wrong or incomplete. For example when I first pulled the demographic stats from GA on the standard profile for the experiment site I was not even considering that it would not include our blockers, which would likely skew the demographic numbers even more. Just as spam bots distort our stats in a way we can see, the missing real users distort without us knowing they are even missing.
This discussion is also connected back to “Do Not Track” preferences as well. In a previous experiment on DNT preferences (run on a different site) I found about 17% of users had opted into a DNT On setting (again, only counting those that don’t block GA!). If those 17% weren’t getting their preference respected, then who can blame them for wanting to install a blocker? How do we respect a preference for not being tracked while still knowing enough about the existence of those users to run our sites effectively? Definitely a discussion that is worth happening, but while we are just starting that discussions many users have clearly already taken action as I think this experiment shows.
Since originally publishing this article in January I’ve continued to collect data. Here’s the update, this follow-up is based on 8,400 samples, so about 3.5x larger than our first experiment.
Total GA blockers are up: from 8.7% (Dec-Jan) to 11% (Jan-Jun).
Again, it is very important to note this is for our test site, which has a demographic with a higher adblock % than typical, the average site is very likely much lower.
Where did that growth (+2.3%) come from?
- More ad-blocker users overall (ad-blocker adoption continues to grow in 2016).
- An increase particularly in Android blockers
- Our original sample had substantially more iOS blockers than Android:
- In our new sample Android has caught up in blocker usage, in fact becoming slightly more likely to use blockers than iOS:
- Our original sample had substantially more iOS blockers than Android:
The reason for this change is that as of Feb 1 Samsung started allowing ad blocking via their browser and plugins so apps like Crystal Adblock that were previously only available on iOS (and by default block analytics trackers) became available on Android, at least to Samsung users.
Also, this newer sample is mostly done with GA transport method: “beacon”, which in theory should also limit the amount of false positives, though I was not able to see any discernible difference after changing that method.
Looking for the code I used to run this? The basics are on github.
I used Analytics Pro’s Universal Analytics PHP library and RGoogleAnalytics in addition to the other tools already mentioned.
To protect the privacy of the users (especially those that are intentionally blocking GA) this experiment was done in a separate GA property with no ties to any other information the experiment site had about the user.