How Many Users Block Google Analytics, Measured in Google Analytics

December 2017 update (blocking levels down some).

June 2016 update (blocking levels a little higher).


There’s now a significant number of users that use ad-blockers, estimated at about 15% in the US (according to PageFair), and that number grew quickly in 2015. Extensions like AdBlock Plus, NoScript, and Ghostery are incredibly popular (#1, #4, and #6 amongst the most popular Firefox add-ons), and for good reasons.  The ad-blocking “war” has been a very hot topic of conversion, so I wanted to take a look at some of the collateral damage in that war — blocked analytics trackers.

Ad Blockers That Block Analytics

Since ad blockers are focused on blocking ads (shocking, I know…) we might not think about their effects on our analytics trackers. But did you know that most of the popular blockers can block analytics trackers as well? Some even do it by default. There are a lot of different ad-blockers out there on a lot of different platforms, but a quick run through the most popular per platform shows a huge number of users that could be blocking our analytics trackers. I’m focusing on Google Analytics here, but really this is applicable for any third-party tracker. Here’s a list of some of the top blockers (and there are a lot more out there!):

App / Extension Platform Users (Dec 2015) Users (Dec 2017) Blocks GA by default?
AdBlock Plus Cross-platform 21M users (Firefox)
40M+ (Chrome)
300M total downloads
13.2M users (Firefox)
100M+ users (total)
500M+ total downloads
No, but easily added
(one click post-install)
Adblock primarily Chrome 40M+ (Chrome) 40M+ (total) No, but easily added.
 uBlock Origin Cross-platform 630k users (Firefox)
2.5M users (Chrome)
4.2M users (Firefox)
10M+ users (Chrome)
Yes
Ghostery Cross-platform 1.5M users (Firefox)
2.3M users (Chrome)
1M users (Firefox)
2.9M users (Chrome)
No, but easily added.
Purify iOS unknown, top 10 at launch*  #363 in App Store (via Applyzer) Yes
Adblock (FutureMind) iOS #39 in App Store No, but easily added.
Adblock Browser (Eyeo) Android 1-5M installs  10-50M installs Depending on install options
Google Analytics Opt-Out Chrome 720k users  760k users Yes

* Post-iOS9 launch blockers Peace, Purify & Crystal were #1, #3 & #6  respectively in the App Store.
(Browser add-on active users are based on users that are actively check for updates, seemingly regardless of add-on status. This means users that have the add-on disabled would still be counted. However most users do not disable, Firefox reports 95% of uBlock Origin users and 96% of AdBlock Plus users did have their add-ons set to enabled according to their add-on status information page).

Who even needs a plugin? As of November Firefox includes tracking blocking protection in the browser itself, on by default and blocking GA in private browsing windows.

That’s a significant potential for gaps in our measurements, but how many users exactly?? The biggest user base is the AdBlock Plus users, but it is hard to know how many of those users choose to block analytics trackers. When you install AdBlock Plus you get a post-install screen that gives you a choice to turn on tracking / analytics blocking:

AdBlock Plus Post-Install Screen

AdBlock Plus Post-Install Screen

We know that most users stick with the defaults (which does not block GA), but considering the size of the install base that is probably a number big enough to be concerned about. If you are a user that is motivated / annoyed enough to install an ad blocker you are already in the group much less likely to stick with defaults.

Measuring the amount of users that run ad-blockers in general can be a little tricky, but the primary method is pretty well established at this point. You create content that looks like an ad on your site and then see if is being blocked by looking at what is ultimately loaded by the browser. Daniel Carlbom has an
easy-to-implement ad-blocker user counting method on his blog using Google Tag Manger, or PageFair also offers a free service that will track your ad-block percentage with a sensor you place on your site.

Except what happens if your analytics service is also being blocked?! Both GA and PageFair are in the Ghostery and AdBlock Plus list of analytics trackers to be blocked. This means these methods are measuring users that block ads but do allow analytics trackers, so your actual ad-block percentages are even higher than these methods show.

So what do we do? How do we measure things when its our measurement software itself that’s being blocked! Should we all just go back to server log file analyzers!?

After breathing into a bag for a while, I decided to try a small experiment to see if I could get at least a general idea of how many people are blocking Google Analytics.

An Experiment to Count the Blockers

My approach was to run GA in parallel, once as normal in the browser, and then once using the GA measurement protocol on the server-side. So every request would result in firing two different events; one client-side through analytics.js, and one server-side via the measurement protocol API:

Parallel GA measurement

 

 

This has the effect of double-counting everything, an event call from the front-end along with an event call from the back-end with all the relevant front-end info passed to it.

I just needed a way to line up the two event hits, so I’d know it was the same end-user making both requests. Enter browser fingerprinting. I expect many of you are familiar with browser fingerprinting; it is a way to assign a (relatively) unique id to each user’s browser based on the configuration and settings of that browser (first brought into wide discussion from the EFF’s Panopticlick project).  I used fingerprint2.js to create the fingerprints, which uses 24 different sources of settings to assign a unique id per browser. It is certainly possible to end up with the same fingerprint for two different end-users, but it’s rare and in our case is not likely to influence our overall numbers since it would simply take one data point away.

(I could have achieved a similar effect with a random id that I created and then dropped as a cookie, but I chose to use a fingerprint because using cookies to track users that run blockers might have lead to a small number of users that aggressively clean their cookies getting counted a bunch of times. It also leads to less duplication in the case of bots that might purge their cookies frequently.)

The end result is lots and lots of duplicate events:

Raw GA calls from analytics.js and measurement protocol mashed up.

What you’ll (hopefully) notice is that every event comes in a pair in the list above aside from one. So what happened there? This happened:

Client-side GA measurement blocked

Meaning something happened to block the analytics.js code from reporting to GA, which is exactly what we want to measure! In the example above I know for sure what happened because that one was me testing — using Ghostery to block GA.

So we can just count up all the cases where we got a call from the server-side, but not the client-side, right?

Well… almost. There are a couple of special cases that we need to look at first.

1. Bots.
Yea, bots… The unsleeping, unrelenting bane of any web analyst looking to get precise numbers.The first bit of good news is that our method is JavaScript-heavy, meaning we’ve knocked out many bots already. Before the server-side PHP script calls the measurement protocol it had to jump through these JS-hoops: a) successfully be fingerprinted via JavaScript and b) do a jQuery AJAX call.

This means the only bots we’re left with are the very capable “headless browsers”. This is a growing percentage of bots, but still likely in the single digits (Incapsula reported only 1.2% of DDoS bots were capable of passing JavaScript+cookie challenges in Q3 2015). Of particular concern are ones that end up as false positives (i.e. bots that only fire the server-side code not the client-side code) and/or show up under multiple fingerprints, thus getting counted multiple times.

Interestingly in this experiment the main offender of this was Googlebot. It is of course capable of doing all the JavaScript required, but it is also smart enough to know not to fire the GA tracking code on the client-side. It even came in under multiple browser fingerprints (which is very interesting in its own right). Other search engine bots like Baidu also showed up for me in this way; capable enough to render all the JavaScript but “compassionate” enough to not fire client-side Google Analytics sensors. Excluding these bots was actually really easy, it was just a question of clicking the exclude bots box in the view configuration. You know, this guy under view settings:
botcheckbox
Unfortunately there were other bots as well. Filtering out the referral spam bots was easy enough just using my referral spam filter. I excluded some other bots as well based on manual inspection of their behavior and network source. Some bots will fire GA just like an other JavaScript and some won’t, once you exclude any that create a lot of different browser signatures it hopefully should not have a big impact on our data.

2. Callback failures.
Ok, I admit it, that chart above is not entirely correct. There’s still another way we can get only the server-side measurement call:

Server-side return code not executed (browser disappeared?)

If the AJAX call is made but the browser disappears before the callback happens, then we can also miss out on the client-side call. Or if the browser just doesn’t execute the callback code itself we will get mismatch.

Because the fingerprint generation is relatively slow (about 500ms for me the first time it runs) and the AJAX call doesn’t happen until that succeeds and is relatively fast (~200ms depending on latency), the browser would need to disappear at just the right time (in between when the AJAX call was made and when it returns) to miss the callback. This could happen if you hit the back button at just the right time. To verify that this wasn’t a major issue I ran a test looking at sessions with > 1 event. This means the user hit more than one page, which solves the back button issue, though potentially not other technical callback issues. The result was nearly the same rate of blocking users, just a much smaller sample size, which convinced me to ignore this possible effect.

Ok, finally time to count up the results. There’s no way inside the regular GA reporting system itself to do that kind of comparison on individual sessions, but it’s easy to do in R, so let’s make a venn diagram!

venn

red: server-side, blue: client-side, purple: both

The red is only server-side measured, blue is only client-side, and purple is measured by both. The blue (client-side only) sliver is so small (.34%) it’s hard to see, but that’s what we should expect. It should be exceedingly rare in our scenario that the client-side fired ok when the server didn’t. This might happen if there was a PHP error on the server-side or the measurement protocol hit was never registered for whatever reason. At .34% I’m not going to worry too much about that. The server-side only slice is very big though, over 8%!

That a pretty high number, definitely higher than I was expecting. Let’s do some sanity checks. Based on what we know about ad-blocking popularity and how the tools work I’d expect the following (for the record, I came up with this list before processing the results):

  1. Higher percentage of blocking in the EU vs. US.
    geo_all geo_blockers
  2. Higher percentage of blocking on desktop vs. mobile.
    device_all device_blockers
  3. Higher on iOS vs Android.
    os_all os_blockers
  4. More Firefox, less IE.
    browser_all browser_blockers

So a confirmation on all our assumptions! Could it really be so high, 8.7% missed?

First off, the site I used for testing is one that is solidly in the demographic for running blockers: young (50% 18-34), male (68%), and internet savvy (the #1 page on the site is internet meme-related). These demographics have potentially very high levels of blocking. A survey from Moz + Fractl found 63% of respondents in that 18-34 demographic said they used ad blockers, though most actively measured (rather than surveyed) numbers have been much lower. If for example the true ad blocker percentage on our test site was 25% it doesn’t seem unreasonable to me that 1/3 of those people were also blocking analytics trackers.

Second, there is still the issue of bots & callback failures. A certain amount of callback failures may happen, though I’m unable to estimate how many exactly. In my browser testing the callback method was 100% reliable, but in the wild it is likely to be imperfect.

Takeaways

Overall this is a somewhat limited experiment on a single niche site (about 2,400 users measured over a month), but it was enough to convince me that there could be quite a few users disappearing from our third party analytics. For increased accuracy I would like to run the experiment again on a site with more traffic, a more average demographic, and more comprehensive bot detection.

Most of us analysts rely on these numbers so much that it’s difficult to consider the ways in which they are wrong or incomplete. For example when I first pulled the demographic stats from GA on the standard profile for the experiment site I was not even considering that it would not include our blockers, which would likely skew the demographic numbers even more. Just as spam bots distort our stats in a way we can see, the missing real users distort without us knowing they are even missing.

This discussion is also connected back to “Do Not Track” preferences as well. In a previous experiment on DNT preferences (run on a different site) I found about 17% of users had opted into a DNT On setting (again, only counting those that don’t block GA!). If those 17% weren’t getting their preference respected, then who can blame them for wanting to install a blocker? How do we respect a preference for not being tracked while still knowing enough about the existence of those users to run our sites effectively? Definitely a discussion that is worth happening, but while we are just starting that discussions many users have clearly already taken action as I think this experiment shows.

Since originally publishing this article in January I’ve continued to collect data.  Here’s the update, this follow-up is based on 8,400 samples, so about 3.5x larger than our first experiment.

Update June 2016

Total GA blockers are up: from 8.7% (Dec-Jan) to 11% (Jan-Jun).

venn
Again, it is very important to note this is for our test site, which has a demographic with a higher adblock % than typical, the average site is very likely much lower.

Where did that growth (+2.3%) come from?

  • More ad-blocker users overall (ad-blocker adoption continues to grow in 2016).
  • An increase particularly in Android blockers
    • Our original sample had substantially more iOS blockers than Android:
      os_all os_blockers
    • In our new sample Android has caught up in blocker usage, in fact becoming slightly more likely to use blockers than iOS:
      os-breakdown-all-2016 os-breakdown-blockers-2016

The reason for this change is that as of Feb 1 Samsung started allowing ad blocking via their browser and plugins so apps like Crystal Adblock that were previously only available on iOS (and by default block analytics trackers) became available on Android, at least to Samsung users.

Also, this newer sample is mostly done with GA transport method: “beacon”, which in theory should also limit the amount of false positives, though I was not able to see any discernible difference after changing that method.

Update December 2017

Total GA blockers have gone down since our last experiment run in June 2016.

GA blocking percentage: 8%

Recap:
Jan 2016: 8.7%
Jun 2016: 11%
Dec 2017: 8%

This run of the experiment was similar in size to the first run (about 6 weeks, 2,220 users), so as before not a huge sample.

Changes in the blocker profiles since previous runs:

  • Way more Android blocking. Android made up 68% of the blocking audience, a flip from the original experiment where iOS was predominent. Android users now appear to be more likely to be running blockers than iOS users.
  • Firefox users are running more blocking. In this 2017 run Firefox users made up just 9% of the audience (same as in 2015), but 30% of the blockers, up from 19% in the first run. Two possible reasons for this may be:
    • The movement of Firefox users from Adblock Plus (which does not block GA by default) to uBlock Origin (which does block GA by default). This may have also happened on other browsers, but it well-documented for Firefox because of the usage statistics they provide.
  • An increase in non-EU/US blocking. Since our test site is largely US-targeted, we don’t have a lot of data on this, but this would be in line with reporting from PageFair earlier in 2017 documenting a big increase in mobile ad blocking in Asia.

So what does this mean?? Are blocking rates down?

The previously mentioned Firefox add-on stats show some changes.

Average Daily Users

Add-on Nov 2016 Nov 2017 Change
AdBlock Plus 20.4M 13.2M -7.2M daily users
uBlock Origin 2M 4.2M +2.2M
totals: 22.4M 17.4M -5M

This migration is likely due to the “acceptable ads” program which ABP participates in but uBlock Origin does not. This is a net -5M decrease in daily users of the 2 most popular Firefox content blockers, but due to both a change in how Firefox counted daily users in June 2017 where they no longer count a disabled install as an active user (and also Firefox’s decreased market share in that period) I would not take that -5M as an indicator of a decrease in ad block users overall.

My take — this represents a slowdown or even plateau in ad block growth rates overall, but the shift towards blockers that block GA by default mean that GA block rates have likely not changed much overall or perhaps even increased.

As stated throughout this article, this is one experiment on one website, not an attempt to track GA block rates across the web. These block rates are highly dependent upon the type of user, and as the type of user has changed to our test site over time the resultant block rate have also changed.

During our early 2016 experiments, the test site ranked well for some meme-related search terms that I am confident are associated with high probability blocker users. In late 2017 that traffic has shifted to more general purpose organic shopping terms. So even if the over demographic hasn’t changed, the individual users are less likely to be blockers. In all runs of the experiment that organic traffic was mixed with paid traffic from the same AdWords campaigns to try and provide as much consistency as possible, but obviously what keywords the site ranks for in organic isn’t controllable in the same way.

 


Looking for the code I used to run this? The basics are on github.

I used Analytics Pro’s Universal Analytics PHP library and RGoogleAnalytics in addition to the other tools already mentioned.
To protect the privacy of the users (especially those that are intentionally blocking GA) this experiment was done in a separate GA property with no ties to any other information the experiment site had about the user.
Thanks to wmacgyver for the JavaScript assistance and sanity checks.