(7/22 Update on SSL at the bottom)
I was very interested to read this article by Gene McKenna from Groupon that was posted to Search Engine Land last week, as were a lot of people. This article says that when Groupon de-indexed themselves from Google that 60% of their direct traffic to long URLs (i.e. not likely to be typed-in) turned out to actually be organic traffic, since it disappeared while they were de-indexed. I had noticed the same thing in the past with sites that had serious Google penalties, which has effectively the same result as intentionally de-indexing yourself, but never computed the percentage. I proceeded to test one of these penalties and found almost exactly the same number, a 59% loss of direct traffic.
“Wow, confirmed!” I thought. Then I thought through it some more and realized that what I confirmed was the site I was looking at and Groupon had similar traffic profile shares — by that I mean an similar proportion of Direct vs. Organic. Then I realized that I had made probably the same misread from the article that a lot of people had.
While the headline “60% of ‘Direct’ Traffic is Actually Organic” is no doubt accurate, it’s accurate for Groupon’s experiment, but may or may not apply to your site. To be clear, I’m not saying the article is in any way incorrect, I’m saying you should not assume that 60% of your site’s direct traffic is organic.
That 60% is Groupon’s number based on their traffic profile: % of organic, % of direct, browser makeup, etc. Gene says this himself in the comments (which actually are worth reading, go SEO community!): “The takeaway from this is not ‘60% of Direct is SEO’. The takeaway is 60% of Direct traffic to urls that are long enough that you wouldn’t think a user is really typing them in is probably SEO.”
This clears a lot up, but the 60% number may still not apply to your site, even just the long-URL pages, because it’s assuming a lot about those pages. It’s assuming that you have a similar proportion of Organic and Direct as Groupon, and that those long-URLs function in the same way as Groupon (i.e. they do pretty well in Organic and don’t drive a lot of Direct otherwise). That could be true for you, but it depends what’s making up your “Direct” traffic.
The main issue here (as again was pointed out in the comments) is that “Direct” is a misleading name. It definitely doesn’t mean that it was detected by your analytics as having been “typed-in” via a browser; it’s a negative definition meaning that it lacks referral data at all. “Direct” could be so many things: type-in, bookmark, email, browser auto-complete, referrer lost from SSL to non-SSL, IM, stripped by privacy settings/software, etc, etc. It’s better described as “None”, which is of course what Google Analytics describes its medium as.
This is why we wonder what is in that bucket so much. Is it actually Organic? How much was from untracked emails? We know that traffic came to the site *somehow*, it’s just frustrating to not know how.
Out of what creates that 60% number, the most interesting & useful one to me would be the percentage of Organic that was missed and therefore fell through to Direct. Taking out per-site technical issues (SSL, which GA sensor or analytics software you use, page delivery issues, etc.) that miss rate should be similar for most sites. Think of it as the overall accuracy of your analytics sensor when it comes to reading search engine referrals.
This “missed Organic count” would be referrals from search engines that show up as Direct as a percentage of Organic. With that miss percentage you could multiply by your total volume of Organic and compare to Direct to get a percentage of Direct that might actually have been organic, no matter what your share of Organic & Direct might be. In other words:
% Direct Actually Organic = (% Missed Organic * % Organic ) / % Direct
(using the simple example #2 lower in this article: 60% = ( 4% * 75% ) / 5%)
This may seem like I’m just restarting the same thing as Groupon did. In a way I am; I’m just taking it one step back to be more generalizable no matter what your site profile is like. An example of a site that does not fit the normal acquisition shares will make this more clear why this is necessary.
Example: penalty on a site with an extremely high percentage of Google Organic.
This site has an unusually high percentage share of Google Organic traffic and a consistent level of traffic. When it lost most of that traffic in a penalty the amount of Direct that turned out to be Organic was actually much more than 60%.
Here’s the drop (93% loss of Google Organic, still painful to look at):
Then overlay that drop in Google Organic vs. the drop in Direct:
As you can see the drops match up well, as you’d expect if a lot of Direct was actually Organic, same as what Groupon showed. The fact that Direct doesn’t drop quite as much is a representation of what’s left-over after the missed Google Organic “spillover” is 93% removed.
After doing the math, in this case the Google Organic was actually 79% of reported Direct traffic for the whole site. For this site there really just wasn’t much in Direct compared to the overwhelming amount of Google Organic. Segmented to just the long URLs it’s over 95%, so way way higher than 60%. We could have been wrong the other way too with a site with high quantities of “real” Direct traffic or low Organic.
Above I said the real interesting number was the “missed” Organic percentage:
% Missed Organic = (% Direct / % Organic) * % Direct as Organic
or, more simply, if you happened to have lost all your Organic like in these two examples:
% Missed Organic = Direct Drop Volume / (Measured Organic Volume + Direct Drop Volume)
(the denominator is that way because our “real” Organic level would include what we missed measuring)
In this case above that number was 1.7% of Google Organic traffic missed and therefore falling through to Direct.
Your mileage may vary, but where the most variation may be is over time, and that is my big caveat here. This data above is old (it’s a very uncommon set of circumstances to see or I’d use something newer), so this was before the issues with mobile referral tracking came into play or SSL was as common for referrals. I believe that 1.7% is higher now, in fact from Groupon’s 60% number it would almost have to be.
This article from conductor last year sets the average percentages of Organic vs. Direct as 47% vs. 29% respectively. That is for a whole site, not the long-URL pages Groupon highlights. From these we can start with a simplified example, adjusting Organic up to represent the missed Organic and making it a nice round number.
Simple Example #1: Whole Site, Modest Missed Organic %
|Organic Traffic Share %||50%||500|
|Direct Traffic Share %||25%||250|
|Missed Organic (fell through to Direct)||2%||10|
|Percent of Direct Actually Organic||4%|
That’s a “whole site” number with some assumptions about referral percentage shares and a 2% “missed Organic” assumption based on my example above.
|Organic Traffic Share %||75%||750|
|Direct Traffic Share %||5%||50|
|Missed Organic (fell through to Direct)||4%||30|
|Percent of Direct Actually Organic||60%|
That’s based on some (I think) reasonable assumptions on Organic traffic share for deep links. This one ends up with the important number of 4% of missed Organic. If 4% is where we’re at these days for Organic referrals slipping through the cracks it means that the overall site numbers in our simple example above would be 8% overall Direct traffic for the whole site actually being Organic.
Simple Example #3: Whole Site, Best Guess Missed Organic %
|Organic Traffic Share %||50%||500|
|Direct Traffic Share %||25%||250|
|Missed Organic (fell through to Direct)||4%||20|
|Percent of Direct Actually Organic||8%|
So if you misread the original Search Engine Land in the most inaccurate way and thought 60% of your whole site’s Direct was Organic and are looking for a better number, I’m saying it’s somewhere > 4%, maybe in the 8% range, but not with a lot of confidence and it depends on your site!
That’s my best guess based on triangulating the two examples, if anyone else has any more data or ideas I’d be happy to hear what they are to see if we can come up with a better number.
One more note about those numbers above is that it assumes “Organic” includes the “Missed Organic” traffic numbers (otherwise you’d be taking a percentage of something that didn’t include the traffic you were actually pulling out with that percentage). So if you want to calculate this yourself based on the formula above you’d need to increase your Organic Traffic Share % to what it would really be including the missed traffic first.
Update Jul 22:
I’ve gotten some questions about referrer passing from SSL to non-SSL, specifically how Google can pass referrer at all if it’s always going from SSL to non-SSL. The answer is that they bounce it off a non-SSL’d redirector on their side before you see the click. Bing does not do this, but Bing does not default to SSL, so only the small number of users that explicitly go to the https:// would lose their referrers in this way, everything else should mostly pass referrers ok.
Here’s what a table of what support looks like on the latest version of browsers:
|Passes Readable Referrer|
|Google (defaults to SSL)||Y|
|Bing (defaults to non-SSL)||Y|
|Bing (if user explicitly goes to SSL)||N|
|Yahoo (defaults to SSL)||Y|
Yahoo bounces traffic through a non-SSL’d redirector, e.g.:http://r.search.yahoo.com/_ylt=… which allows Google Analytics to pick it up as Yahoo / Organic (of course with “not provided”).
Google does the same thing via a redirector like: http://www.google.com/url?sa=…
Though with Chrome they are being slightly fancier an using an internal ping request. Which looks like this:
Request URL:https://www.google.com/url?sa=... Content-Type:text/ping Origin:https://www.google.com Ping-From:https://www.google.com/?gws_rd=ssl#q=your+keywords Ping-To:http://www.yourdomain.com/
This appears to be 100% browser-internal (i.e. does not make a request of the Ping-To server) but allows the next request made of the target URL to have a referrer of Referer:https://www.google.com/, which is pretty cool. That way they never have to make a non-SSL’d request to google.com but can still pass on an referrer header to a non-SSL’d site.
I assume this is how they support the “meta referrer” tag that is designed to support this situation and only available in Chrome so far, but it is a little strange to see in action in the debugger.