Doing SEO audits is a key step in your online marketing activities. Without proper audits, you never know what you’re doing, and how your activities affect your website.
But, doing a proper SEO audit takes time, and a lot of attention to details. As I’m working through an SEO audit for a site that has over 1.000.000 monthly visitors regularly I noticed a data discrepancy that at first led me to think that I’m onto something very interesting.
The case in point is that for the keyword ranking audit based on data from Webmaster Tools (or Google Search Console for the youngsters out there) you should always use data visualization, even if it’s simple charts that you won’t use in your SEO report. Visualizing numbers can help you tremendously in spotting trends (or problems) in the data.
I downloaded the table (keywords, CTR, clicks, impressions, position), “all” 1000 of them (note to google: c’mon folks, don’t be stingy with kw data, we know you have it all!). I started working with the data and started visualizing some key issues to help me with the report for the client. When I visualized the Clicks/Impressions per SERP Position (Graph 2 in the image above) I noticed something very peculiar. It absolutely trumps the Impression count of any other SERP position. Here’s the chart for the table:
With so many impressions you’d think that the clicks count would follow the same ratio as with other SERP locations, and if that was the case we’d be on a road to discovery of a keyword group that could drastically improve the total traffic of the site. All we’d need to do is put all the guns pointing at that keywords group and build content that targets those keywords. If we can rank #6 and get 82.000 impressions, we can definitely push those keywords to #1 and get up to a million impressions. We could fix the CTR with metadescription and title rework, so no big deal.
And here is the key difference between a proper data interpretation and wishful thinking: when working with averages you want to factor-in data pollution. This data pollution can be human error, or software error. Regardless of the cause, proper data analysis requires thinking. If something jumps off so much, it’s either something nasty, or something completely wrong with the data. You may want to jump to a conclusion that you’ve discovered a gem of an SEO advice for your client, but if things look too good to be true, they probably aren’t.
I sifted through the list of keywords and found a single keyword that got 60.000 impressions, but only 23 clicks. That’s a CTR of 0.04%. We couldn’t stink so much at CTR even if we tried to for a keyword that ranks #6. I checked the ranking for this keyword (yes, a single word, not a keyphrase) and it turns out that the site does rank for variants/keyphrases, but not for the specific keyword. And it’s definitely not #6. The high positions for the keyword were dedicated to Websters and other dictionaries.
I then sifted through the data in Google Search Console (simple CTRL+F does the trick) and realized that this keyword is nowhere to be found in the Console. And this was another eyebrow moment: Console doesn’t show the keyword at all in all those 1000 keyphrases, while I do get it in the exported CSV list of 1000 words. Weird.
Turns out, we’re not up to a gem of a traffic flood anytime soon. Turns out that what started as a “man this is too good to be true” and ended up with a “yeah, this isn’t true, it’s just data pollution case”. It turns out that in fact, SERP #6 is just average in numbers.
How to avoid data pollution problems?
The simplest way is to cut out the extremes. This is a standard procedure used from average population age to average profits. What you do is you take out the top few percent and the bottom few percent of the data set, and just ignore them. In my case, I’d cut out the top 1% and the bottom 1% of the data set and just ignore the keywords and posts that got enormous CTR, provided of course that the ignored data is obviously on a completely different level from the next most clicked item, be it a keyword or URL. If all other top performing keywords also gained 60.000 impressions then it is clear that we are not talking about a fluke in the data. The context clearly shows that those clicks count or impressions are really legitimate number. But as the next top performer was below 7.000 impressions and with a reasonable CTR, it was obvious that keeping the keyword that brought 60.000 impressions is statistically an invalid data set.
The second simple approach that I’ve hinted already is to sort by CTR and Ranking. Why? Because if a keyword or URL has low ranking and low CTR you are not really getting any value from that data. Look for any sort of big jumps in CTR. If the average CTR is say 5%, and then you see a keyword or URL that gets 95%, there’s clearly something to look into. It may turn out that it’s a legitimate number, but usually big jumps like this can be a simple data issue than an actual valuable data set.
The third, and most important thing: Let the numbers speak to you! If your website is aimed at a single topic, it is serving more or less a single (however incoherent) target audience and people will usually behave predictably, so the numbers will also be similar and predictable. You can’t really expect that your site would rank higher than Websters for a noun.
These key things can save your SEO audit. And as you may know, doing a proper SEO audit takes 20-40 hours on average. Confusing data pollution like the one from my example can make that 40 hours as good as wasted time, and more importantly, you’ll be taken down a road of doing things based on faulty data because you jumped to conclusions too soon.