How Many Sites Run Ads.txt (so far..)?

Update Nov 1, 2017

Since I first published this article in July adoption of ads.txt has exploded, measuring as high as 44% measured on a different list but using the same crawler I used. This is great, and one of the big reasons for the growth into a sustainable standard is Google supporting checking ads.txt in Doubleclick.

The Interactive Advertising Bureau (IAB) has recently released a standard to help fight ad fraud: ads.txt. Announced in May and official as of June, this new standard is a simple and elegant attempt to solve the problem of fraudulent publisher inventory. This is the problem of fraud sites masquerading themselves as legitimate sites for the purpose of selling ads at the higher rates of those spoofed sites. It’s the equivalent to paying for an ad during NBC Sunday Night Football, but having it actually shown on “Sunday Night Foosbal” 4am Monday morning on channel 572.

Ads.txt is certainly not a “solution to ad fraud” or totally game-changing, but it is a great idea to help provide verification that the sites where ads are placed are the sites really running the ads

… IF sites use it!

As described in depth in articles like this one in AdAge by Ian Trider (one of the authors of the IAB spec), the way the ads.txt standard works is by declaring which ad networks sell ads for a particular site, so the buyer can know that the inventory they are buying is legit.

This is solving a bigger problem than you might think, a significant part of the ad fraud “stack” right now are sites claiming to run ads as other sites. Methbot (as dissected by whiteops here) was generating $3-5M per day (!!) and a big part of that was running ads on sites impersonating other sites. So you place your ad thinking you’ve bought premium space on espn.com, but in reality your ads ran on some garbage domain and are only viewed and clicked on by bots. Methbot was running on over 6,000 “premium” domains.

Sample real-world ads.txt from nytimes.com/ads.txt

Like robots.txt, the 20+ year-old standard on which ads.txt is inspired by, ads.txt requires the co-operation of multiple parties to be successful.

robots photo

Hobbies include: lasers, world domination, reading .txt files.

Robots.txt requires:
1. The sites in question to host an accurate and updated list of which URLs are off-limits to robots.
2. Robots to read and respect these directives.

Ads.txt requires:
1. The sites in question to host an accurate and updated list of which ad networks are selling their inventory.
2. The networks selling the inventory to read and present the results of these directives to ad buyers.
3. The ad buyers to avoid buying non-verified inventory.

To be clear, 2 & 3 could be combined such that buyers only ever see verified inventory, but that requires complete adoption of ads.txt by publishers which seems pretty far off.

Right now we are in very early days of adoption, but in my opinion we are already in a sensitive period where we need publishers to start deploying their ads.txt files or risk losing momentum right from the start.

Without these files out there in the wild, the ad networks are unlikely to place any sort of priority on fetching them and integrating the results into their systems, as well illustrated by this article by Sarah Sluis on adexchanger.com.

The numbers 1-3 above are dependent steps: just like you can’t expect (well-behaved) robots to bother checking robots.txt if they don’t generally exist or are inaccurate, we shouldn’t expect networks to include the results of ads.txt into their systems until they are getting some kind of strong signal from the publishers in terms of actual files deployed, not just talk.

Additionally if these files don’t remain accurate they quickly become counter-productive. An error in robots.txt can mean big hits in traffic from Google Search, what is the equivalent incentive for keeping your ads.txt up-to-date?? It’s not clear yet, because it’s up to how networks deal with verified inventory. At the moment there’s really no incentive.

So where are we as far as adoption right now? I didn’t see that anyone out there was tracking this, so I decided to do some quick spidering myself, using the reference crawler publisher on github by the IAB as a starting point. The results are not promising.

Quantcast Top 1,000 US Sites: .4% penetration.

So yes, 4 sites. Of course, not every site runs ads. And not every site that runs ads would run programmatic ads (for example youtube’s ads.txt exists and is valid, but is just comments).

What sites probably do run ads? The sites in the Methbot spoofed domains list!

Methbot 6k Site spoofed domain list: .5% penetration.

So for sites that actually were “hit” by Methbot we still have a low adoption rate for a standard intended to address the a problem they had. Only 30 sites had valid ads.txt, although on the plus side the ads.txt files themselves looked pretty robust, so the early adopters seem to be doing a solid job of implementation.

This brings up the motivation problem with ads.txt again. These sites were actively impersonated by a huge ad fraud ring, but did it really hurt them? Not much, or at least not clearly. Until there’s a difference in the payout they get based on whether their traffic is verified or not, there won’t be much hard motivation for adoption. It’s a bit of a catch-22 since it is the networks that control the incentives but they are reasonably waiting to see about adoption rates.

To reiterate: it is definitely early days, but this file could not be easier to deploy, and we shouldn’t expect networks to put a lot of effort into putting this tool in the hands of ad buyers until they are getting a stronger signal. If you are a publisher that wants to do programmatic, you should consider ads.txt to be part of the implementation and start sending a signal you want more accountability!

 


Side note.. spidering all these sites showed how many sites are serving status 200 on their 404 pages.. 4% of the Quantcast top 1,000 are returning error code 200 for /ads.txt when it didn’t exist. This is generally called a “soft 404“, and is a pretty sloppy SEO practice.

Photo by firepile

No comments yet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.